We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

0/5 (Submit Your Rating)

CO

SUMMARY

  • Overall 12 + years of experience in Enterprise Application and product development.
  • Experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like HDFS, MapReduce, YARN, Hive, Pig, HBase, Flume, Sqoop Spark, Storm, Scala, Kafka, Oozie, Zookeeper, MongoDB, and Cassandra.
  • Experience in Hadoop administration activities such as installation and configuration of clusters using Cloudera Manager, Apache Ambari, in HDP.
  • Hands on experience in Installing, Configuring and Troubleshooting using Hadoop ecosystem components like Map Reduce, HDFS, Hive, Pig, Sqoop,Spark, Flume, Zookeeper, Hue, Kafka, Storm& Impala.
  • Configuring, Managing permissions for the users in Hue.
  • Expertise in Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MRv1 and MRv2.
  • Extensive experience in writing MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
  • Built Automated Test Frameworks to test Data Values, Data Integrity, Source - to-Target record counts, and field mappings between Transactional, Analytical Data Warehouse and Reporting systems.
  • Involved in reading writing data from hive through spark SQL & data frames
  • Experienced in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • Involved in setting and monitoring all services running on cluster like HBASE, Flume, impala, hive, pig, kafka.
  • Used Machine Learning Algorithm to train data for sentiment analysis on client data.
  • Worked on processing customer feedback data from press ganey surveys with Spark Scala and storing in Hive tables for further analysis with Tableau
  • Proficient in programming usingSpark& Scala. Experience of programming with various components of the framework, such as Impala. Should be able to code in Python as well.
  • Proficient in developing data transformation and other analytical applications in Spark,Spark-SQL using Scala programming language.
  • Profound experience in creating real time data streaming solutions using ApacheSparkStreaming, Kafka.
  • Experienced in developingSparkscripts for data analysis in both Python and Scala.
  • Experience developing Scala applications for loading/streaming data into NoSQL databases (HBASE) and into HDFS.
  • Experience in developing and designing POCs deployed on the Yarn cluster, compared the performance ofSpark, with Hive and SQL/Oracle.
  • Strong exposure to NoSQL database like HBase, MongoDB, Cassandra.
  • Good knowledge in Kafka and Messaging systems like RabbitMQ.
  • Getting files from Amazon AWS s3 bucket to production cluster
  • Involved in giving access to Cloudera manager, Hue, oozie to internal and external users by creating SR request
  • Granting and revoking users privileges to the both dev and prod clusters
  • Every day running health tests on the services running on Cloudera
  • Download data from FTP with shell scripts to the local clusters for data loading to target tables involving portioning and bucketing techniques.
  • Setting up jobs in Crontab as well as creating oozie workflows, coordination’s.
  • Monitoring all crontab jobs as well as oozie jobs, Debug the issues when any of the jobs fails to complete.
  • Involved in Data analytics and reporting with tableau.
  • Sending daily status of the services running on the cluster to the scrum master.
  • Involved in creating documentation regarding all jobs running on cluster.

TECHNICAL SKILLS

Big Data: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, ElasticSearch

Technical Skills: Core Java, C,C++, Django Python, C#,VC++

Methodologies: Agile, UML, Design Patterns

Programming Language: Scala, Core Java

Database: Oracle, MySQL, Casandra, HBase

Application Server: Apache Tomcat

Web Tools: HTML, Java Script, XPath, XQuery

Tools: SQL developer, Toad

IDE: STS, Eclipse

Operating System: Windows, Unix/Linux

Scripts: Bash, Python, Perl

PROFESSIONAL EXPERIENCE

Confidential, CO

Hadoop/Big Data Developer

Responsibilities:

  • Worked on a live 50 nodes Hadoop clusterrunning CDH5.8
  • Worked with highlyunstructured and semi structured data of 1000 TBin size (270 GB with replication factor of 3)
  • Extensively involved in Design phase and delivered Design documents. Experience in Hadoop eco system with HDFS, HIVE, SQOOP and SPARK with SCALA.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context,Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Worked on analyzing Hadoop cluster and different Big Data Components including Hive, Spark, Kafka, Elastic Search, Oozie and SQOOP.
  • Importing and exporting data into HDFS and Hive using SQOOP.
  • Extensive experience in writingPig (version 0.15)scripts to transform raw datafrom several big data sources into forming baseline big data.
  • DevelopedHive(version 1.2.1) scripts for end user / analyst requirements to perform ad hoc analysis
  • Very good understanding ofPartitions, Bucketingconcepts in Hive and designed both Managed and Externaltables in Hive to optimize performance
  • Solved performance issuesin Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
  • Experienced in defining job flows. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Experienced in managing and reviewing the Hadoop log files.
  • Extracted the data from Teradata/RDMS into HDFS usingSqoop (Version 1.4.6)
  • Created and workedSqoop (version 1.4.6)jobs with incremental loadto populate Hive External tables
  • Data ingestion with spark SQL and creating spark data frames.
  • Strong Knowledge on Multi Clustered environment and setting up Cloudera HadoopEco-System. Experience in installation, configuration and management of Hadoop Clusters.
  • Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
  • Worked on managing big data/ hadoop logs
  • Develop shell scripts for oozie workflow.
  • Worked on BI reporting tool Tableau for generating reports.
  • Integrate Talend with Hadoop for processing big data jobs.
  • Good knowledge of Solr, Kafka
  • Shared responsibility for administration of Hadoop, Hive and Pig.
  • Installed and configured Storm, Solr, Flume, Sqoop, Pig, Hive, HBase on Hadoop clusters.
  • Provide detailed reporting of work as required by project status reports

Confidential, CA

Hadoop / Big Data Developer

Responsibilities:

  • Developed high integrity programs used in systems where predictable and highly reliable operation is essential usingSpark.
  • Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDDs andScala.
  • Implement Flume,Spark,SparkStream framework for real time data processing.
  • Developed analytical component using Scala,SparkandSparkStream.
  • Analyzed the SQL scripts and designed the solution to implement using PySpark.
  • Developed several advanced Map Reduce/ Python programs in JAVA as part of functional requirements for Big Data.
  • Developed Hive (Version 1.1.1) scripts as part of functional requirements as well as hadoop security with Kerberos.
  • Worked with the admin team in designing and upgrading CDH 3 to CDH 4
  • DevelopedUDFsin Java as and when necessary to use in PIG and HIVE queries.
  • Assisted the Hadoop team with developing Map-Reduce scripts in Python.
  • Analyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Build/Tune/Maintain Hive QL and Pig Scripts for loading, filtering and storing the data and user reporting. Involved in creating Hive tables, loading data, and writing Hive queries.
  • Handled importing of data from various data sources, performed transformations usingSparkand loaded data into Cassandra.
  • Worked on the Core,SparkSQL andSparkStreaming modules ofSparkextensively.
  • Used Scala to write code for allSparkuse cases.
  • Exploring with theSparkfor improving the performance and optimization of the existing algorithms in Hadoop using Spark Context,Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Involved inSpark-Cassandra data modelling.
  • Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioural data and financial histories into HDFS for analysis.

Environment: Core Java, multi-node installation, Map Reduce,Spark, Kafka Hive, Impala, flume, Storm, Zookeeper, Oozie, Java, Python scripting, Scala, JDK, UNIX Shell Scripting, TestNG, MySQL, Eclipse, Toad, Tableau 8.X/9.X.

Confidential, CA

Sr. C++/Core Java/Script Developer

Responsibilities:

  • Coordinate with onsite team for requirement gathering.
  • Involved in understanding the business needs and formulated the low level design for Hard to borrow.
  • Created various use cases using massive public data sets
  • And various performance tests for verifying the efficacy of codegen in various modes.
  • Researched, collated, edited content related to Perl and tools. Shared knowledge with peers.
  • Design the Sate-Machine framework for Hard to borrow using Singleton Pattern in C++.
  • Creating the Shared Objects .so files for Hard to borrow business logic, So that any client can invoke it.
  • Developed Perl jobs to insert, update the Sungurd data into the Sybase database in E*TRADE format.
  • Carried out Unit testing on the developed code via C++ tester application, to test for all possible positive and negative scenarios, so as to maintain the quality and performance of the code.
  • Assigning logical units of work to each developer at offshore (A team of 7 Members).

Environment: C, C++, Core Java, Perl, Python, Linux, MySQL, Sybase.

Confidential

C++/C# developer

Responsibilities:

  • Coding/enhancement of programs.
  • Resolution and monitoring of problem tickets.
  • Supporting on call activities.
  • Status reporting, Work plan and documentation

Environment: C, C++.C#, XML, Windows.

Confidential

C++/VC++ Developer

Responsibilities:

  • Coding/enhancement of programs.
  • Resolution and monitoring of problem tickets.
  • Supporting on call activities.
  • Status reporting, Work plan and documentation

Environment: C++,C#,VC++, XML, MySQL, Oracle.

We'd love your feedback!