Hadoop Developer Resume
SUMMARY
- Around 5+ of professional experience in Information Technology and expertise in BIGDATA using HADOOP framework and Analysis, Design, Development, Testing, Documentation, Deployment and Integration using SQL and Big Data technologies.
- Expertise in using major components of Hadoop ecosystem components like HDFS, YARN, MapReduce, Hive, Sqoop, HBase, Spark, Spark SQL, Oozie, Zookeeper, Hue.
- Good understanding of distributed systems , HDFS architecture, Internal working details of MapReduce and Spark processing frameworks.
- Knowledge of ETL methods for data extraction, transformation and loading in corporate - wide ETL Solutions and Data Warehouse tools for reporting and data analysis.
- Develop data set processes for data modelling, and mining . Recommend ways to improve data reliability, efficiency and quality.
- Good knowledge in using Apache NiFi to automate the data movement between RDBMS like Teradata, Oracle, SQL-Server and Flat Files to HDFS and Building hive Tables on Top of them .
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables , which are partitioned.
- Also used Info works (Ingestion and Analytics) tool to ingest the data from various sources like Teradata, Oracle, SQL-Server, DB2, Flat Files …etc. and also building pipelines and workflows to transform the data and also to schedule it.
- Worked on Microsoft azure services like HDInsight Clusters, BLOB, ADLS, Data Factory and Logic Apps and also done POC on Azure Data Bricks .
- Extensive knowledge in writing Hadoop jobs for data analysis as per the business requirements using Hive and worked on HiveQL queries for required data extraction, join operations , writing custom UDF's as required and having good experience in optimizing Hive Queries.
- Pleasant experience of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Worked with various formats of files like delimited text files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats. Has good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO etc.
- Hands on experience with Microsoft Azure Cloud services, Storage Accounts and Virtual Networks.
- Good at manage hosting plans for Azure Infrastructure, implementing and deploying workloads on Azure virtual machines (VMs).
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
- Implemented Cluster for NoSQL tool HBase as a part of POC to address HBase limitations.
- Strong Knowledge on architecture and components of Spark , and efficient in working with Spark Core, Spark-SQL .
- Good knowledge of Spark Scala's functional style programming techniques like Anonymous Functions (Closures), Higher Order Functions and Pattern Matching .
- Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala .
- Good working experience on Spark (spark streaming, spark SQL) with Scala and Kafka . Worked on reading multiple data formats on HDFS using Scala .
- Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data and Used Spark Data Frame Operations to perform required Validations in the data.
- Having extensive knowledge on RDBMS such as Oracle, Microsoft SQLServer, MYSQL
- Excellent understanding and knowledge of job workflow scheduling and locking tools/services like Oozie and Zookeeper .
- Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra .
- Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism.
- Capable of using AWS utilities such as EMR, S3 and cloud watch to run and monitor Hadoop and Spark jobs on AWS.
- Good Understanding of Amazon Web Services like Autoscaling, RedShift, DynamoDB, Route53 .
- Experience with operating systems: Linux, RedHat, and UNIX.
- Experience in complete project life cycle ( design, development, testing and implementation ) of Client Server and Web applications. good programming skills with experience in SQL and Python Programming.
- Worked on various programming languages using IDEs like Eclipse, NetBeans , and Intellij, Putty, GIT .
- Experienced in working in SDLC, Agile and Waterfall Methodologies .
TECHNICAL SKILLS
Programming Languages: Java, J2EE, Python, SQL, Scala
Hadoop Ecosystems: Spark, SparkSQL, PySpark, Hive, HBase, Yarn, Oozie, Zookeeper, Hue, Ambari Server
SQL Languages: MySQL, Teradata, SQL, Oracle
Public Cloud: EC2, IAM, S3, Autoscaling, CloudWatch, Route53, EMR, RedShift
NoSQL: HBase, Cassandra
Tools: /IDES: .Net Beans, Eclipse, IntelliJ, Servlets, Hibernate, Spring, and Struts.
Methodologies: Agile, Waterfall model
Operating Systems: Windows, Red Hat Linux, UNIX
PROFESSIONAL EXPERIENCE
Confidential
Hadoop Developer
Responsibilities:
- Evaluating client needs and translating their business requirement to functional specifications thereby onboarding them onto Hadoop ecosystem .
- Extracted and updated the data into HDFS using Sqoop import from various sources like Teradata, Oracle, SQL-Server, DB2 etc .
- Good knowledge in using Apache NiFi to automate the data movement between RDBMS like Teradata, Oracle, SQL-Server and Flat Files to HDFS and Building hive Tables on Top of them .
- Also used Infoworks (Ingestion and Analytics) tool to ingest the data from various sources like Teradata, Oracle, SQL-Server, DB2, Flat Files …etc. and building pipelines and workflows to transform the data and also to schedule it.
- Developed HIVE UDFs to incorporate external business logic into Hive script and Developed join data set scripts using HIVE join operations .
- Created various hive external tables , staging tables and joined the tables as per the requirement. Implemented static Partitioning, Dynamic partitioning and Bucketing .
- Worked on Microsoft azure services like HDInsight Clusters, BLOB, ADLS, Data Factory and Logic Apps and also done POC on Azure Data Bricks .
- Worked with various HDFS file formats like Parquet, Json for serializing and deserializing.
- Worked with various formats of files like delimited text files, click stream log files, Apache log files , Avro files, JSON files, XML Files. Mastered in using different columnar file formats like RC, ORC and Parquet formats. Has good understanding of various compression techniques used in Hadoop processing like G-zip, Snappy, LZO etc.
- Developed and designed automate process using shell scripting and scheduling them in crontab .
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
- Developed Apache Spark applications by using spark for data processing from various streaming sources.
- Strong Knowledge on architecture and components of Spark, and efficient in working with Spark Core, SparkSQL.
- Migrated Map reduce jobs to Spark jobs to achieve better performance.
- Used HUE for running Hive queries . Created partitions according to day using Hive to improve performance.
Environment: Hadoop, Microsoft Azure, Info works, Spark, Hive, Pig, HBase, Oozie, Sqoop, Oracle, Core Java, HDFS, Eclipse.
Confidential, Raleigh, NC
Hadoop Developer
Responsibilities:
- Evaluating client needs and translating their business requirement to functional specifications thereby onboarding them onto Hadoop ecosystem .
- Extracted and updated the data into HDFS using Sqoop import and export .
- Developed HIVE UDFs to incorporate external business logic into Hive script and Developed join data set scripts using HIVE join operations .
- Created various hive external tables , staging tables and joined the tables as per the requirement. Implemented static Partitioning, Dynamic partitioning and Bucketing .
- Worked with various HDFS file formats like Parquet, Json for serializing and deserializing.
- Worked with the Spark using Scala for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, PySpark, Pair RDD's, Spark YARN.
- Used Spark using Scala for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
- Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured data coming from various sources.
- Implemented Cluster for NoSQL tool HBase as a part of POC to address HBase limitations.
- Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
- Developed Apache Spark applications by using spark with Scala for data processing from various streaming sources.
- Strong Knowledge on architecture and components of Spark , and efficient in working with Spark Core and Spark-SQL using Scala.
- Migrated Map reduce jobs to Spark (Scala) jobs to achieve better performance.
- Working on designing the MapReduce and Yarn flow and writing MapReduce scripts, performance tuning and debugging.
- Developed a NIFI Workflow to pick up the data from SFTP server and send that to Kafka broker .
- Developed Oozie workflow engine to run multiple Hive, Sqoop and Spark jobs .
- Identifying opportunities to improve infrastructure that effectively and efficiently utilizes the Microsoft Azure Windows server, Microsoft SQL Server, Microsoft Visual Studio, Windows PowerShell, Cloud infrastructure.
- Deployed Azure IaaS virtual machines (VMs) and Cloud services (PaaS role instances) into secure V-Nets and subnets.
Environment: Hadoop (HDFS, Map Reduce), Yarn, Spark, Hive, Pig, HBase, Oozie, Hue, Sqoop, Kafka, Oracle, NIFI, Azure services.
Confidential, Albany, NY
Hadoop Developer
Responsibilities:
- Involved in complete Big Data flow of the application starting from data ingestion upstream to HDFS , processing the data in HDFS and analyzing the data and involved.
- Configured Flume to extract the data from the web server output files to load into HDFS .
- Used Flume to collect, aggregate and store the web log data from different sources like web servers, mobile and network devices and pushed into HDFS .
- Utilized Flume to filter out the input data read to retrieve only the data needed to perform analytics by implementing flume interception .
- Worked on analyzing Hadoop Cluster and different big data analytic tools including Pig, Hive .
- Working experience with data streaming process with Apache Spark, Hive .
- Worked with various HDFS file formats like Avro, Sequence File, Json and various compression formats like Snappy, bzip2.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Analyzed the SQL scripts and designed the solution to implement using Scala .
- Used Spark-SQL to Load JSON data and create SchemaRDD and loaded it into Hive Tables and handled structured data using Spark SQL .
- Tested Apache Tez for building high performance batch and interactive data processing applications on Hive jobs .
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig .
- Worked on Apache NIFI to decompress and move JSON files from local to HDFS .
- Experience on moving raw data between different systems using Apache NIFI .
- Involved in loading data from UNIX file system to HDFS using Shell Scripting .
- Used Elasticsearch for indexing/full text searching .
- Good knowledge of AWS services like EC2, S3, Autoscaling and DynamoDB .
Environment: Hadoop (HDFS, Map Reduce), Spark, Hive, Scala, Cassandra, Python, Pig, Sqoop, Hibernate, spring, Oozie, AWS Services EC2, S3, Autoscaling, Elastic Search, DynamoDB, UNIX Shell Scripting, TEZ.
Confidential
Hadoop Developer
Responsibilities:
- Imported data from relational data sources to HDFS using Sqoop .
- Involved in collecting and aggregating large amounts of log data and staging data in HDFS for further analysis.
- Worked with NoSQL database HBase in creating HBase tables to load large sets of semi-structured data coming from various sources .
- Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs. .
- Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data into HDFS.
- Used Hive for Querying data which moved to HBase from various sources.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark .
- Developed Kafka producer and consumers, HBase clients, Apache Spark and Hadoop MapReduce jobs along with components on HDFS, Hive .
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive .
- Developed small distributed applications in our projects using Zookeeper and scheduled the workflows using Oozie .
- Developed several shell scripts , which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume .
- Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
Environment: Hadoop (HDFS, MapReduce), Hive, Pig, Sqoop, Flume, Yarn, HBase, Oozie, Java, SQL scripting, Linux shell scripting, Eclipse and Cloudera.
Confidential
Jr. Hadoop Developer
Responsibilities:
- Extensively involved in Design phase and delivered Design documents.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
- Managed and reviewed Hadoop log files J ob tracker, Name Node, secondary Name Node, data node, and task tracker.
- Involved in loading data from local file system to HDFS using HDFS Shell commands.
- Involved in moving all log files generated from various sources to HDFS for further processing throu gh Sqoop and Flume .
- Involved in writing MapReduce programs for analytics and for structuring the data.
- Migrated existing SQL queries to HiveQL queries to move to big data analytical platform
- Created Hive Tables , loaded values and generated adhoc-reports using the table data.
- Extending HIVE core functionality by using custom UDF's.
- Written Java program to retrieve data from HDFS and providing REST services.
- Integrated multiple sources of data ( SQL Server, DB2, MySQL ) into Hadoop cluster and analyzed data by Hive-HBase integration.
- Worked on Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Involved in writing optimized Pig Script along with involved in developing and testing Pig Scripts .
- Managing and scheduling Jobs on Hadoop cluster using Oozie .
- Managing and monitoring the Hadoop cluster through Cloudera Manager .
Environment: HDFS, MapReduce, Apache Hadoop, Cloudera Distributed Hadoop, Hive, Flume, Sqoop, MySQL, Linux, Apache Sqoop.