Sr. Hadoop Developer Resume
Cleveland, OH
SUMMARY:
- Adept and experienced Hadoop developer with over 7 years of experience in programming world and 5 years of proficiency in Hadoop ecosystem and Bigdata systems
- In - depth experience and solid subjective knowledge of HDFS, Map Reduce, Hive, Pig, Sqoop, Yarn/MRv2, Spark, Kafka, Impala, HBase and Oozie.
- Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution
- Substantial experience writing MapReduce jobs in Java, PIG , Flume , Tez and Hive
- Used Spark Data frames, Spark-SQL and RDD API of Spark for performing various data transformations and dataset building.
- Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
- Has strong fundamental understanding of distributed computing and distributed storage concepts for highly scalable data engineering.
- Worked with Pig and Hive and developed custom UDF’s for building various datasets.
- Worked on MapReduce framework using Java programming language extensively.
- Strong experience troubleshooting and performance fine-tuning spark, MapReduce and hive applications.
- Worked with Click Stream Data extensively for creating various behavioral patterns of the visitors and allowing data science team to run various predictive models.
- Worked on No-SQL data-stores, primarily HBase using the Java API of HBase and Hive Integration.
- Experienced in working with monitoring tools to check status of cluster using Cloudera manager and Ambari
- Implemented Dynamic Partitions and Buckets in HIVE for efficient data access.
- Significant experience in working with cloud environment like AMAZON WEB SERVICES (AWS) EC2 and S3.
- Strong expertise in Unix shell script programming.
- Expertise in creating Shell-Scripts and Regular Expression.
- Dexterous in visualizing data using Tableau, PowerBI and MS Excel.
- Knowledge on Enterprise Data Warehouse (EDW) architecture and various data modeling concepts like star schema, snowflake schema and Teradata.
- Highly proficient in Scala programming Knowledge
- Experience with web technologies which include HTML, CSS, Java Script, Ajax, JSON and frameworks like J2EE, Angular JS, spring.
- Good Knowledge in REST Webservices, SOAP programming, WSDL, XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
- Acquaintance with Agile and Waterfall methodologies. Responsible for handling several clients facing meetings with great communication skills.
- Good experience in Customer support role as, resolving production issues based on priority.
TECHNICAL SKILLS:
Hadoop/Bigdata Ecosystems: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper, Elastic Search
Languages: C, C++, Java, Scala, Python, C#, SQL, PL/SQL
Frameworks: J2EE, Spring, Hibernate, Angular JS
Cluster Management and Monitoring: Coudera Manager, Hortonworks Ambari
Oracle 11g, MySQL, SQL: Server
Development Tools: Eclipse, NetBeans, Visual Studio, IntelliJ IDEA, XCode
Build Tools: ANT, Maven, sbt, Jenkins
Application Server: Tomcat 6.0, WebSphere7.0
Business Intelligence Tools: Tableau, Splunk, PowerBI
Version Control: GitHub, Bit Bucket, SVN
WORK EXPERIENCE:
Sr. Hadoop Developer
Confidential, Cleveland, OH
Responsibilities:
- Gathered User requirements and designed technical and functional specifications.
- Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, PIG, HBase, Zookeeper and Sqoop.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
- Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
- Imported and exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Worked on importing and exporting data into HDFS and Hive using Sqoop.
- Used Flume to handle streaming data and loaded the data into Hadoop cluster.
- Developed and executed hive queries for de-normalizing the data.
- Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
- Responsible for executing hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query the data into HBase.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
- Worked on Cluster of size 130 nodes.
- Designed Apache Airflow entity resolution module for data ingestion into Microsoft SQL Server.
- Developed batch processing pipeline to process data using python and airflow. Scheduled spark jobs using airflow.
- Involved in writing, testing, and running MapReduce pipelines using Apache Crunch.
- Managed, reviewed Hadoop log file, and worked in analysing SQL scripts and designed the solution for the process using Spark.
- Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
Environment : Hadoop, YARN, HBase, Teradata, D2, NoSQL, Kafka, Python, Zookeeper, Oozie, Tableau, Apache Crunch, Apache Storm, MySQL, SQL Server, jQuery, JavaScript, HTML, Ajax and CSS.
Hadoop Developer
Confidential, Eagan, MN
Responsibilities:
- Worked on a live 24 node Hadoop cluster running on HDP 2.2.
- Importing and exporting data jobs, to perform operations like copying data from RDBMS and to HDFS using Sqoop.
- Worked with Sqoop jobs with incremental load to populate HAWQ External tables to internal table.
- Created external and internal tables using HAWQ.
- Worked with Spark core, Spark Streaming, and spark SQL modules of Spark.
- Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
- Experience in transferring data from RDBMS to HDFS and HIVE table using SQOOP.
- Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
- Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
- Experience in working with Flume to load the log data from multiple sources directly into HDFS.
- Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters.
- Assisted with performance tuning, monitoring, and troubleshooting.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Experience in manipulating the streaming data to clusters through Kafka and Spark- Streaming.
- Optimized Hive QL/pig scripts by using execution engine like TEZ, Spark.
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Experienced in reviewing Hadoop log files to delete failures.
- Performed benchmarking of the NoSQL databases, Cassandra and HBASE streams.
- Worked with Pig, HBASE, NoSQL database HBASE and Sqoop. For analysing the Hadoop cluster as well as big data.
- Knowledge of workflow/schedulers like Oozie/crontab/Autosys.
- Very good understanding of partitions, bucketing concepts in Hive and designed both Managed and External tabled in Hive to optimize performance.
- Creating Hive tables and working on them for data analysis to meet the business requirements.
- Developed a data pipeline using Spark and Hive to ingest, transform and analysing data.
- Experience in using Sequence files, RC file, AVRO and HAR file formats.
- Hands on Experience writing PIG Scripts to Tokenized sensitive information using PROTEGRITY.
- Used FLUME to dump the application server logs into HDFS.
- Automating backups by shell for Linux to transfer data in S3 bucket.
- Experience in UNIX Shell scripting.
- Hands on experience using HP ALM. Created test cases and uploaded into HP ALM.
- Automated incremental loads to load data into production cluster.
Environment: Hadoop, MapReduce, AWS, HDFS, Hive, HBASE, Sqoop, Pig, Flume, Oracle, Teradata, PL/SQL, Java, Shell Scripting, HP ALM.
Hadoop Developer
Confidential, St Louis, MO
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and pre-processing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Used Multithreading, synchronization, caching and memory management.
- Used JAVA application development skills with Object Oriented Analysis and extensively involved throughout Software Development Life Cycle (SDLC).
- Proactively monitored systems and services, architecture design and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Built BIG data clusters using Apache Spark architecture for Analytics.
- Developed PIG Latin scripts for the analysis of semi structured data. Developed and involved in the industry specific UDF (user defined functions)
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Implemented partitioning, dynamic partitions and buckets in HIVE.
- Load and transform large sets of structured, semi structured and unstructured data.
- Supported Map Reduce Programs those are running on the cluster.
- Wrote shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in loading data from UNIX file system to HDFS, configuring Hive and writing Hive UDFs.
- Utilized Java and MySQL from day to day to debug and fix issues with client processes.
- Managed and reviewed log files.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Spark, MongoDB, Flume, Spark, HTML, XML, SQL, MySQL, Core Java, Eclipse, Shell scripting, UNIX.
Big Data Engineer/Developer
Confidential
Responsibilities:
- Developed several advanced Map Reduce programs to process data files received
- Developed Map Reduce Programs for data analysis and data cleaning.
- Firm knowledge on various summarization patterns to calculate aggregate statistical values over dataset.
- Experience in implementing joins in the analysis of dataset to discover interesting relationships.
- Completely involved in the requirement analysis phase.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
- Strong expertise in internal and external tables of HIVE and created Hive tables to store the processed results in a tabular format.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Developed Pig Scripts and Pig UDFs to load data files into Hadoop.
- Analyzed the data by performing Hive queries and running Pig scripts.
- Developed PIG Latin scripts for the analysis of semi structured data and unstructured data.
- Strong knowledge on the process of creating complex data pipelines using transformations, aggregations, cleansing and filtering
- Experience in writing cron jobs to run at regular intervals.
- Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
- Experience in using Flume to efficiently collect, aggregate and move large amounts of log data.
- Involved in loading data from edge node to HDFS using shell scripting.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Experience in managing and reviewing Hadoop log files.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Environment: Hadoop 1.1.1, Java, Apache Pig 0.10.0, Apache Hive 0.10.0, MapReduce, HDFS, Flume 1.4.0, GIT, UNIX Shell scripting, PostgreSQL, Linux.
Java Developer
Confidential
Responsibilities:
- Involved in Analysis, Design, Implementation and Bug Fixing Activities.
- Designing the initial Web-WAP pages for a better UI as per the requirement.
- Involved in Functional & Technical Specification documents review and the code review.
- Undergone on the Domain Knowledge.
- Involved in design of basic Class Diagrams, Sequence Diagrams and Event Diagrams as a part of Documentation.
- Discussions and meetings with the Business Analysts for understanding the functionality involved in Test Cases review.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Prepared the Support Guide containing the complete functionality.
Environment: Core Java, ApacheTomcat5.1, Oracle 9i, Java Script, HTML, PL/SQL, Rational Rose, Windows XP, UNIX.
