We provide IT Staff Augmentation Services!

Hadoop Developer Resume

SUMMARY

  • Over 7+ years of professional IT experience in all phases of Software Development Life Cycle, including 4+ years of data processing and analysis experience, handled high volume data and supported various technology stacks.
  • Experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Oozie, Apache Spark, Kafka.
  • Experience with distributed systems, large - scale non-relational data stores, data modeling, and big data systems.
  • Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle
  • Experienced in importing and exporting data between Hive, HDFS and RDBMS using Sqoop and Spark.
  • Strong experience and knowledge of real time data analytics using Spark and Kafka.
  • Hands on experience in Spark with good knowledge on Spark Architecture and its in-memory processing.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Set, Data Frame, Pair RDD's.
  • Experienced in writing queries and sub-queries for SQL, Hive and Spark; and used different Spark modules like Spark Core, Spark RDDs, Spark Data Set, Spark Data frame and Spark SQL.
  • Experienced in converting SQL stored procs and Hive queries into spark using data frames and Spark SQL.
  • Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
  • Very good understanding on NOSQL databases like mongo DB and HBase.
  • Have good Knowledge in ETL and hands on experience in ETL.
  • Experienced in both Hortonworks and Cloudera platforms.
  • Expertise in RDBMS like Oracle, MS SQL Server, MySQL and DB2.
  • Experienced with use-case development, with Software methodologies like Agile and Waterfall.
  • Team player with excellent communication, presentation and interpersonal skills.

TECHNICAL SKILLS

Hadoop/Big Data: Hive, HBase, Pig, Impala, Spark Core, Spark SQL, Spark Streaming, SQOOP, YARN, Ambari, MapReduce, Kafka, Impala

Programming & scripting languages: Python, Scala, shell script, SQL, HQL

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Services: SOAP, RESTFUL

Databases: Oracle, MySQL, DB2, MS-SQL Server

Version Control Systems: GitHub, GitBucket

IDE’s: Eclipse, IntelliJ, PyCharm, Putty,Jupyter

ETL/Other Tools: Erwin Data Modeler, ER Assistant, Informatica Power Center 8.6.1/9, SSIS (Visual Studio)

Business Intelligence Tools: Tableau, Microstrategy,ClickView

Querying Tools: SQL Management Studio 2008/2012, Teradata SQL Assistant, SQL Plus, SQL Developer, PL/SQL

PROFESSIONAL EXPERIENCE

Confidential

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop
  • Developing Spark programs using Scala API’s to compare the performance of Spark with Hive and Impala
  • Developed one spark job to run multiple different jobs
  • Worked extensively with hive and spark for importing data from RDBMS, hive Views and migrating the ETL jobs into HDFS.
  • Involved in discussions with business Analysts and Information Architecture for bug validation and fixing
  • Understanding current Data Sources with deep dive in to analyze the structure of the Data Set.
  • Involved in creating Hive tables, loading with data and writing hive queries that will run internally in MapReduce way
  • In depth knowledge of Hadoop Architecture and various components such as HDFS, Application master, node manager, Resource Manager, NameNode, DataNode and MapReduce concepts
  • Debugging and identifying issues reported by QA
  • Data Ingested in to HDFS before it is transformed and loaded into target systems using Hive, MapReduce, SPARK and SQOOP.
  • Transformations accomplished with MapReduce, SPARK and HIVE
  • Used Hive to do transformations, event joins and some pre-aggregations before storing the data in HDFS
  • Configured NDM to receive and transfer files from upstream and downstream respectively
  • Responsible for our Autosys jil script and running end to end testing of our code in Autosys
  • Fully Responsible in version control system. Updating all our codes in our bit bucket and generating artifact URL for the production team
  • Created Hive queries for performing data analysis and improving performance using tuning parameters.
  • Analyzed the data by performing Hive queries and running Pig scripts
  • Responsible for ETL jobs that load data from Data Sources to HDFS
  • Responsible for moving Transformed data into Data Lake
  • Migration of ETL processes from different data sources to Hive to test the easy data manipulation.
  • Trained and mentored developers on Hadoop framework, HDFS, MapReduce concepts and Hadoop Ecosystem.

Environment: Cloudera, HDFS, Hive, Spark Core, Spark SQL, Bit Bucket, Shell Scripting, InterlliJ, Autosys, Oracle, MS-SQL, Linux, JIRA.

Confidential

Hadoop/Spark Developer

Responsibilities:

  • Responsible for data ingestion into Raw data stage from multiple databases using Sqoop and Spark
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Developed Scala scripts, UDFFs using both RDD, Spark Datasets, Spark Data frame and Spark SQL in Spark for Data transformation, Aggregation, queries and writing data into Data lake.
  • Created Hive managed and external tables to store processed data
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts and Persist in Spark, effective & efficient Joins, Transformations and others during data processing
  • Responsible for creating Hive tables, partitions, bucketing, loading data
  • Create Hive queries for performing data analysis and improving performance using tuning parameters.
  • Optimizing of algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Worked on a POC to compare processing time of Spark SQL for batch processing to Apache Hive.
  • Migration of ETL processes from different data sources to Hive to test the easy data manipulation.
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Worked on Sequence files, ORC files, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Implemented data ingestion and handling clusters in real time processing using Kafka.
  • Involved in loading and transforming large sets of structured, semi-structured and unstructured data from relational databases into HDFS using Sqoop imports and Spark SQL.
  • We Analyzed data and piped out to Tableau for Visualization and Reports
  • Involved in business requirement gathering and analysis

Environment: Cloudera, Spark Core, Spark SQL, Spark Streaming, Map Reduce, HDFS, Hive, Sqoop, Scala, python, SQL, HBase, Zookeeper, PL/SQL, Oracle, Linux, Tableau, MYSQL

Confidential

Hadoop Developer

Responsibilities:

  • Involved in ETL, Data Integration and Migration. Imported data using Sqoop to load data from Oracle to HDFS on regular basis.
  • Use Spark-Streaming APIs to perform necessary transformations and actions on modeling data from Kafka in near real time into HDFS
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce, spark and then loading data into HDFS
  • Create external Partitioned and Bucketing tables in Hive.
  • Written HQL queries for data analysis to meet the business requirements and writing Hive scripts to extract, transform and load the data into Database.
  • Involved in migration of MapReduce programs into Spark transformations using Spark.
  • Hands on experience in writing Impala, HiveQL queries, Pig Latin Scripts and custom UDF's for Hive and Pig using Python.
  • Created Hive tables to store the processed results in a tabular format.
  • Responsible in exporting analyzed data to relational databases using Sqoop and incremental Importing and data into HDFS from database and vice versa using Sqoop.
  • Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts. Developed Hive, Pig queries to process the data for BI visualizing.
  • Proactively involved in ongoing maintenance, support and improvements in Hadoop cluster.
  • Monitored workload, job performance, health and capacity planning using Cloudera Manager.
  • Used JIRA for bug tracking and SVN and GitHub for version control.

Environment: Cloudera, HDFS, Hive, Spark Core, Spark SQL, Pig, Sqoop, Shell Scripting, InterlliJ, Kafka, Oracle, MS-SQL, Linux, SVN, JIRA.

Confidential

Hadoop/Spark Developer

Responsibilities:

  • Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Spark, Kafka, Hive, HBase, Oozie, ZooKeeper, Sqoop with Cloudera Distribution.
  • Responsible in loading data from LINUX file system to HDFS
  • Load and transform large sets of structured, semi structured and unstructured data
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Experienced with batch processing of data sources using Apache Spark.
  • Consumed the data from Kafka queue using spark.
  • Experienced in implementing Spark RDD transformations, actions to implement business analysis.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Created Hive tables and involved in data loading and writing Hive UDFs.
  • Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Worked on designing NoSQL Schemas on HBase.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Experience in writing and implementing the Impala and Hive ad-hoc queries.
  • Automated the process for extraction of data from warehouses and weblogs by developing work-flows and coordinator jobs in Oozie.

Environment: Horton Works, Spark, Spark Streaming, Spark SQL, Kafka, Sqoop, Hadoop, MapReduce, HDFS, Hive, Java, Scala, Oracle, GitHub, Shell Scripting.

Hire Now