Hadoop Developer Resume CA - Hire IT People

SUMMARY:

5 years of overall experience in building and developing Hadoop Map Reduce solutions and also experience in using Hive, Impala, Pig, Spark, Flume and Kafka.
Experience in installation, configuration, supporting and monitoring Hadoop clusters using Cloudera distributions and AWS.
Good experience in writing Python Scripts.
Good experience with both Job Tracker (Map reduce 1) and yarn (Map reduce 2).
Good experience in Spark and its related technologies like SparkSQL and Spark Streaming.
Working experience in DevOps environment.
Experience in defining detailed application software test plans, including organization, participant, schedule, test and application coverage scope.
Good Understanding in Apache Hue and Accumulo.
Techno - functional responsibilities include interfacing with users, identifying functional and technical gaps, estimates, designing custom solutions, development, leading developers, producing documentation, and production support.
Good understanding in using version control like GITHUB and SVN
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Experience in converting Hive queries into Spark transformations using Spark RDDs and Scala.
Having experience on RDD architecture and implementing spark operations on RDD and also optimizing transformations and actions in spark.
Expertise in using various tools in Hadoop ecosystem including MapReduce, Hive, Pig, Oozie, Sqoop, Hbase, Flume, Spark, Kafka, and Zookeeper.
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
Extending Hive and Pig core functionality by writing custom UDFs.
Experience in analyzing data using HQL, Pig Latin, and custom Map Reduce programs in core Java.
Knowledge in job workflow scheduling and monitoring tools like oozie..

TECHNICAL SKILLS:

Big Data: Hadoop HDFS, Map Reduce, Hive, Impala, PIG, HBase, ZooKeeper, Sqoop, Oozie, Spark, Scala, Flume, Kafka and Avro.

Programming Languages: C, C++, JAVA/J2EE, Python.

Methodologies: AGILE, Waterfall.

Web Technologies: HTML5, CSS3, JavaScript, jQuery, AJAX, JSON.

Java Technologies: Servlets, JSP, EJB, web services, JDBC, JSON

Databases: Oracle 11g/10g, DB2, SQL Server, MySQL, MS-Access

Application Servers: Web Logic, Web Sphere.

Monitoring and Reporting Tools: Ganglia, Custom Shell scripts.

Version Control: Perforce, SVN, GIT, Bit Bucket

PROFESSIONAL EXPERIENCE:

Confidential, CA

Hadoop Developer

Responsibilities:

Involved in loading data from UNIX file system to HDFS.
Involved in creating Hive tables, loading with data and writing hive queries(HiveQL) which will run internally in map reduce way.
Designed and developed data pipeline for different events of applications data, to filter and load consumer response in AWS S3 bucket into Hive external tables.
Worked with different file formats like JSON, AVRO, CSV, ORC and Parquet and compression techniques like snappy, and Zlib.
Followed Agile Scrum methodology for the entire project.
Selecting the appropriate AWS service based upon data, compute, system requirements.
Involved in design, and development of generic Pyspark programs in python to reduce the delivery time of data processing applications.
Designed and implemented Data check and Data quality frameworks in Pyspark during the initial load process and the final publish stages.
Used AWS EMR for processing of the ETL jobs and load to S3 buckets and AWS Athena for adhoc/low latency querying on S3 data.
Developed python code for workflow management and automation in Airflow.
Implemented the Spark & Hive best practices, optimizations to efficiently process data by utilizing features like partitioning, resource tuning, memory management.
Developed UDF’s in Pyspark to anonymize users personal data and created a framework to delete inactive users.
Used Bit bucket as code repository and Jenkins as continuous integration tool.

Environment: Linux, Hadoop, Spark HBase, Sqoop, Pig, Impala, Hive, HQL, Flume, AWS, Zookeeper, Elastic Search, Maven, Devops, Agile, Oracle 11g, Cloudera.

Confidential, NC

Hadoop Developer

Responsibilities:

Responsible for designing and implementing the data pipeline using Big Data tools including Hive, Oozie, Spark, Sqoop, Kafka, and EC2, S3 and EMR.
Used Sqoop to extract and load incremental and non-incremental data from RDBMS systems into Hadoop.
Involved in converting the JSON data into Data Frame and stored into hive tables.
Created multiple groups and set permission policies for various groups in AWS.
Created streaming cubes and persist into Hbase for building OLAP cubes.
Used parquet file format with snappy compression and solved hive small files problem by using merge files, and merge mapred files parameters in hive.
Converted existing Snowflake schema data into Star schema in hive for building OLAP cubes.
Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.
Converted some existing sqoop, hive jobs to Spark SQL applications to read data from Oracle using JDBC and write it to hive tables.
Analyzed the SQL scripts and designed the solution to implement using Scala Spark.
Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames.
Developed shell scripts for removal of orphan partitions for hive tables, and archive retention in HDFS.
Explored Spark for improving the performance and optimization of the existing jobs in Hadoop using Spark context, Spark-SQL, Spark Streaming, Data Frame, pair RDD's, Spark YARN.
Validating the fact table data which is migrated on daily load basis.
Used AWS EMR (Elastic Map Reduce) for resource intensive transformation jobs.

Environment: Hive, Spark, S3, AWS, SQL, DB2, Impala, Tableau, Git, Kafka, Zookeeper, YARN, Unix shell scripting, Cloudera, Hbase, Elastic -MapReduce.

Confidential, San Jose, CA

Big Data Developer

Responsibilities:

Worked on data querying tool Hive to store and retrieve data.
Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
Developed oozie workflow for scheduling ETL process and Hive Scripts.
Involved in writing queries in Spark Sql using Scala.
Integrated Spark with MapR-DB using Scala to persist data into Elasticsearch and also for other use cases.
Exported data from Impala to Tableau reporting tool, created dashboards on live connection.
Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster.
Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
Involved in converting Hive/Sql queries into Spark transformations using Spark RDD’s.
Developed UDF’s using both DataFrames/Sql and RDD in Spark For data Aggregation queries and reverting back into OLTP through sqoop.
Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
Developed multiple MapReduce jobs in java to clean datasets.
Collected the logs data from web servers and integrated in to HDFS using Flume.
Developed UNIX shell scripts for creating the reports from Hive data.
Manipulate, serialize, model data in multiple forms like JSON, XML.
Prepared avro schema files for generating Hive tables
Created Hive tables and loaded the data in to tables and query data using HQL.

Environment: Hadoop MapReduce 2 (yarn), Zookeeper, Scala, HDFS, PIG, Hive, Flume, Eclipse, Ignite Core Java, Sqoop, Spark, Agile, Spark SQL, Devops, Cloudera, Linux shell scripting.

Confidential

Hadoop Developer

Responsibilities:

Extensively worked on importing data from SQL Server and converting stored procedures to Spark jobs.
Developed common utilities for spark jobs for parallel import of data from source RDBMS, handling data skew.
Developed a python framework for loading back data to SQL Server from Hive for incremental data.
Worked with different file formats like Json, AVRO, ORC and Parquet and compression techniques like snappy.
Extensively used Spark optimization techniques for decreasing the processing time of job including but not limited to repartitioning, memory parameters tuning.
Used AWS services like S3 for storing data,EC2, EBS and RDS for spinning up instances on-demand.
Extensively used Hive optimization techniques for improving query performance and LLAP/DRILL for low latency end user queries.
Developed Spark application for filtering Json source data in AWS S3 location and store it into HDFS.
Used Stonebranch as workflow orchestration tool for scheduling ETL jobs.
Worked on a POC using Apache KYLO as a self-service tool based on Apache Spark and NiFi. Kylo automates many of the tasks associated with data lakes, such as data ingest, preparation, discovery, profiling, and management.
Wrote complex SQL queries and stored procedures.
Cluster coordination services through Zookeeper.
Followed Agile methodology in analyze, define, and document the application which will support functional and business requirements.
Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.

Environment: Hive, Impala, HBase, UNIX, Hortonworks, MySql, AWS.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship