We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

CO

SUMMARY:

  • Overall 12 + years of experience in Enterprise Application and product development.
  • Experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like HDFS, MapReduce, YARN, Hive, Pig, HBase, Flume, Sqoop Spark, Storm, Scala, Kafka, Oozie, Zookeeper, MongoDB, and Cassandra.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera Manager, Apache Ambari, in HDP.
  • Hands on experience in Installing, Configuring and Troubleshooting using Hadoop ecosystem components like Map Reduce, HDFS, Hive, Pig, Sqoop, Spark, Flume, Zookeeper, Hue, Kafka, Storm& Impala.
  • Configuring, Managing permissions for the users in Hue.
  • Expertise in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MRv1 and MRv2.
  • Extensive experience in writing MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Built Automated Test Frameworks to test Data Values, Data Integrity, Source - to-Target record counts, and field mappings between Transactional, Analytical Data Warehouse and Reporting systems.
  • Involved in reading writing data from hive through spark SQL & data frames
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • Involved in setting and monitoring all services running on cluster like HBASE, Flume, impala, hive, pig, kafka.
  • Used Machine Learning Algorithm to train data for sentiment analysis on client data.
  • Worked on processing customer feedback data from press ganey surveys with Spark Scala and storing in Hive tables for further analysis with Tableau
  • Proficient in programming using Spark & Scala. Experience of programming with various components of the framework, such as Impala. Should be able to code in Python as well.
  • Proficient in developing data transformation and other analytical applications in Spark, Spark-SQL using Scala programming language.
  • Profound experience in creating real time data streaming solutions using Apache Spark Streaming, Kafka.
  • Experienced in developing Spark scripts for data analysis in both Python and Scala.
  • Experience developing Scala applications for loading/streaming data into NoSQL databases (HBASE) and into HDFS.
  • Experience in developing and designing POCs deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Oracle.
  • Strong exposure to NoSQL database like HBase, MongoDB, Cassandra.
  • Good knowledge in Kafka and Messaging systems like RabbitMQ.
  • Getting files from Amazon AWS s3 bucket to production cluster
  • Involved in giving access to Cloudera manager, Hue, oozie to internal and external users by creating SR request
  • Granting and revoking users privileges to the both dev and prod clusters
  • Every day running health tests on the services running on Cloudera
  • Download data from FTP with shell scripts to the local clusters for data loading to target tables involving portioning and bucketing techniques.
  • Setting up jobs in Crontab as well as creating oozie workflows, coordination’s.
  • Monitoring all crontab jobs as well as oozie jobs, Debug the issues when any of the jobs fails to complete.
  • Involved in Data analytics and reporting with tableau.
  • Sending daily status of the services running on the cluster to the scrum master.
  • Involved in creating documentation regarding all jobs running on cluster.

TECHNICAL SKILLS:

Big Data: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, ElasticSearch

Technical Skills: Core Java, C,C++, Django Python, C#,VC++

Methodologies: Agile, UML, Design Patterns

Programming Language: Scala, Core Java

Database: Oracle, MySQL, Casandra, HBase

Application Server: Apache Tomcat

Web Tools: HTML, Java Script, XPath, XQuery

Tools: SQL developer, Toad

IDE: STS, Eclipse

Operating System: Windows, Unix/Linux

Scripts: Bash, Python, Perl

PROFESSIONAL EXPERIENCE:

Confidential, CO

Hadoop/Big Data Developer

Responsibilities:

  • Worked on a live 50 nodes Hadoop cluster running CDH5.8
  • Worked with highly unstructured and semi structured data of 1000 TB in size (270 GB with replication factor of 3)
  • Extensively involved in Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, HIVE, SQOOP and SPARK with SCALA.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Worked on analyzing Hadoop cluster and different Big Data Components including Hive, Spark, Kafka, Elastic Search, Oozie and SQOOP. Importing and exporting data into HDFS and Hive using SQOOP.
  • Extensive experience in writing Pig (version 0.15) scripts to transform raw data from several big data sources into forming baseline big data.
  • Developed Hive (version 1.2.1) scripts for end user / analyst requirements to perform ad hoc analysis
  • Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Experienced in defining job flows. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting. Experienced in managing and reviewing the Hadoop log files.
  • Extracted the data from Teradata/RDMS into HDFS using Sqoop (Version 1.4.6)
  • Created and worked Sqoop (version 1.4.6) jobs with incremental load to populate Hive External tables
  • Data ingestion with spark SQL and creating spark data frames.
  • Strong Knowledge on Multi Clustered environment and setting up Cloudera Hadoop Eco-System. Experience in installation, configuration and management of Hadoop Clusters.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Worked on managing big data/ hadoop logs
  • Develop shell scripts for oozie workflow.
  • Worked on BI reporting tool Tableau for generating reports.
  • Integrate Talend with Hadoop for processing big data jobs.
  • Good knowledge of Solr, Kafka
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Installed and configured Storm, Solr, Flume, Sqoop, Pig, Hive, HBase on Hadoop clusters.
  • Provide detailed reporting of work as required by project status reports

Confidential, CA

Hadoop / Big Data Developer

Responsibilities:

  • Developed high integrity programs used in systems where predictable and highly reliable operation is essential using Spark.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Implement Flume, Spark, Spark Stream framework for real time data processing.
  • Developed analytical component using Scala, Spark and Spark Stream.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Developed several advanced Map Reduce/ Python programs in JAVA as part of functional requirements for Big Data.
  • Developed Hive (Version 1.1.1) scripts as part of functional requirements as well as hadoop security with Kerberos.
  • Worked with the admin team in designing and upgrading CDH 3 to CDH 4
  • Developed UDFs in Java as and when necessary to use in PIG and HIVE queries.
  • Assisted the Hadoop team with developing Map-Reduce scripts in Python.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Build/Tune/Maintain Hive QL and Pig Scripts for loading, filtering and storing the data and user reporting. Involved in creating Hive tables, loading data, and writing Hive queries.
  • Handled importing of data from various data sources, performed transformations using Spark and loaded data into Cassandra.
  • Worked on the Core, Spark SQL and Spark Streaming modules of Spark extensively.
  • Used Scala to write code for all Spark use cases.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Involved in Spark-Cassandra data modelling.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.

Environment: Core Java, multi-node installation, Map Reduce, Spark, Kafka Hive, Impala, flume, Storm, Zookeeper, Oozie, Java, Python scripting, Scala, JDK, UNIX Shell Scripting, TestNG, MySQL, Eclipse, Toad, Tableau 8.X/9.X.

Confidential, CA

Sr. C++/Core Java/Script Developer

Responsibilities:

  • Coordinate with onsite team for requirement gathering.
  • Involved in understanding the business needs and formulated the low level design for Hard to borrow.
  • Created various use cases using massive public data sets
  • And various performance tests for verifying the efficacy of codegen in various modes.
  • Researched, collated, edited content related to Perl and tools. Shared knowledge with peers.
  • Design the Sate-Machine framework for Hard to borrow using Singleton Pattern in C++.
  • Creating the Shared Objects .so files for Hard to borrow business logic, So that any client can invoke it.
  • Developed Perl jobs to insert, update the Sungurd data into the Sybase database in E*TRADE format.
  • Carried out Unit testing on the developed code via C++ tester application, to test for all possible positive and negative scenarios, so as to maintain the quality and performance of the code.
  • Assigning logical units of work to each developer at offshore (A team of 7 Members).

Environment: C, C++, Core Java, Perl, Python, Linux, MySQL, Sybase.

Confidential

C++/C# developer

Responsibilities:

  • Coding/enhancement of programs.
  • Resolution and monitoring of problem tickets.
  • Supporting on call activities.
  • Status reporting, Work plan and documentation

Environment: C, C++.C#, XML, Windows.

Confidential

C++/VC++ Developer

Responsibilities:

  • Coding/enhancement of programs.
  • Resolution and monitoring of problem tickets.
  • Supporting on call activities.
  • Status reporting, Work plan and documentation

Environment: C++,C#,VC++, XML, MySQL, Oracle.

Hire Now