We provide IT Staff Augmentation Services!

Sr Hadoop Developer Resume

5.00/5 (Submit Your Rating)

San Antonio, TexaS

SUMMARY:

  • Hadoop Admin/Developer with around 8+ years of IT experience hands on experience on Bigdata, Development and Design of Java based submissions.
  • Handful expertise in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions along with adequate experience on Horton Networks.
  • Skilled in broad variety of gears in the Big Data pile such as Pig, Hive, HDFS , MapReduce, Sqoop, Spark, Kafka, Yarn, Oozie, and Zookeeper .
  • Familiar with configuring, installing and handling Apache Hadoop in various distributions like Cloudera, Hortonwoks, and MapR.
  • Good knowledge on querying using Spark SQL .
  • Worthy experience in programming real - time processing by means of Spark Streaming with Kafka.
  • Part of a team working with Spark for batch analysis based on top Yarn.
  • Good experience in setting up and configuration of Spark for testing and development.
  • Industrialized Spark trades in a run time test environment for efficient and swifter data retrieving and working on it using Scala.
  • Good experience in data retrieving and processing using HIVE and PIG.
  • Involved in data transfer HDFS to RDBMS using SQOOP and vice-versa.
  • Great experience in developing MapReduce programs using Apache cloud era distribution.
  • Working experience with Linux lineup like Red hat and CentOS.
  • Worthy experience in using Flume for gathering and transferring large amount of data from application server and modeling variety of data.
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis
  • Good knowledge and experience in NoSQL database like Hbase.
  • Was a part of the team responsible for log file organization where the logs created more than 7 days old had to be removed from log folder and had to be loaded into HDFS and stored for approximately 3 months.
  • Good experience in Web technologies with the assistance of Servlets, Java, J2EE, JSP, JDBC, Java Beans.
  • Creating and developing attractive designs for dynamic websites using AJAX and testing it using SOAP web technologies.
  • Good knowledge on Firewall and Azure technologies.
  • Strengths include handling variety of software systems, capacity to learn and adapt to new technologies, amicable team player and curriculum focused with strong personal, technical and communication skills.

TECHNICAL SKILLS:

Big Data (Hadoop Framework): Cloudera Distribution, HDFS, Map Reduce, Yarn, PIG, Hive, Flume, Oozie, Zookeeper, Base, SQOOP, Spark, Scala, Kafka, Storm, Apache Phoenix, Data Node, Name Node, Resource Manager

Databases: MySQL, Oracle (SQL, PL/SQL), IBM DB2, MS Access

NoSql Databases: Base, Mongo DB 3.0.1, Cassandra

Languages: SQL, JAVA, J2EE, PYTHON, Pig Scripting, C

Scripting Languages: JavaScript, JSP 2.0/1.2, JQuery, JSON, HTML 5, Linux & Unix scripts, XML

ETL: Talend ETL, Talend Studio

Web Technologies: JSP, Servlets, JavaBeans, JDBC, AWT, Swing, JSF, XML, AWS, AJAX, SOAP, XSLT

IDE: Eclipse, NetBeans, IBM RAD

XML Technologies: XML, XSLT

Operating Systems: Windows XP, 2007 Professional, 7, 8 & 10, UNIX, Linux, CentOS, Ubuntu, RedHat LINUX

PROFESSIONAL EXPERIENCE:

Confidential, San Antonio, Texas

Sr Hadoop Developer

Responsibilities:

  • Developed a data pipeline using Flume &Sqoop to extract the data from weblogs and store in HDFS.
  • Written HiveUDFs along with installing and configuring HIVE.
  • Involved in writing Hive queries to meet the business requirements of the clients.
  • Transmission of processed data from HDFS to RDBMS or any other external file systems was carried out using Sqoop.Involved in using Apache Splunk add-ons to enhance data ingestion and analysis of log data.
  • Used Spark API over Horton works HDP 2.2 and Hadoop YARN to perform analytics on data in Hive
  • Experience with Horton works HDD2.2& Cloudera Manager Administration also experience in Installing, Updating Hadoop and its related components in Single node cluster as well as Multi node cluster environment using Ambari, Cloudera Manager 2.X,3.X,4.X, Horton works
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Responsible for gathering the business requirements for the Initial POCs to load the enterprise data warehouse data to Greenplum databases.
  • Fixed a bug in the tls certificate checking code that prevented certificates with long chains to be used in Golang.
  • Worked on Implementation of a log producer in Scala that watches for application logs transform incremental log and sends them to a Kafka and Zookeeper based log collection platform
  • Expertise in design & development of various reports, dashboards using various Tableau visualizations like Whiskers plot, Pie-charts, heat maps and box plots.
  • Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
  • Involved in design, development, performance tuning and optimization of Essbase databases.
  • Extracted and restructured the data into MongoDB using MongoDB import and export command line utility tool.
  • Implemented code according to coding standards and Created AngularJS Controller, which isolate scopes perform operations.
  • Developed custom directives and Services in AngularJS.
  • Developed and supported the extraction, transformation and load process (ETL) for a Data Warehouse from their OLTP systems using AbInitio and provide technical support and hands-on mentoring in the use of Ab Initio.
  • Used Python and Django to interface with the JQuery UI and manage the storage and deletion of content.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Involved in data modelling and data warehousing by using tools like ETL along with Teradata.
  • Developed Pig Scripts for data change notification and delta record processing between the recently arrived data and the previously prevailing data in HDFS.
  • Involved in Installing, Organizing Hadoop ecosystem, and Cloudera Manager using CDH3 Distribution.
  • Design and implement large Service oriented architecture(SOA) based integration applications in onshore/offshore model using agile approach.
  • Created Hive tables and managed loading the data and writing hive queries that will run internally in MapReduce.
  • Automated the cloud deployments using Puppet, Python and AWS Cloud Formation Templates.
  • Was involved in loading and transmitting data into HDFS and Hive using Sqoop and Kafka.
  • Provided support to MapReduce programs those are running on the cluster.
  • Performed Metadata mapping from legacy source system to target database fields and involved in creating Ab Initio DMLs.
  • Used Tableau for designing and implementing dashboards.
  • Processed and maintained Hadoop log files on a span for 7 days for 3 months.
  • Participated in swiveling the HDFS data from rows to columns and vice versa.
  • Involved as an additional project developer throughout the transitioning phase of waterfall into agile and the business solution development life cycle (SDLC) as a business system analyst.
  • Filtering boot traffic, transformations and event joins were carried out using PIG before loading the data onto HDFS.
  • Designed and Developed Mappings, sessions and workflows in Informatica for data migration.

Environment: Hadoop, MapReduce, MongoDB, Yarn, Hive, Pig, Base, Kafka, Oozie, Sqoop, Ab Initio, Flume, Core Java, Cloudera, Django, Talend, Impala, Python, HDFS, Eclipse.

Confidential, Findlay, Ohio

Hadoop Developer

Responsibilities:

  • Responsible for maintaining the Hadoop cluster, commissioning and decommissioning.
  • Responsible for managing and reviewing data backups and handling log files.
  • Being a Hadoop developer before enabled me to work alongside Hadoop developers in maintaining scalable support substructure for Hadoop.
  • Involved in performing cluster failovers, Freeze and Unfreeze Veritas Cluster run Hastatus-Summ exclusively for solaris and clustat for Linux, to help enable and disable cluster and also for cleared failed states.
  • Involved in configuring and routing of the HDFS clusters on cloud.
  • Monitoring systems and services through Ambari dashboard to make the clusters available for the business
  • Involved in Forecasting analysis and verifying data from different sources.
  • Providing support and training to the fresher or less senior associates in development and administration of Hadoop clusters.
  • Worked on Implementation of a log producer in Scala that watches for application logs transform incremental log and sends them to a Kafka and Zookeeper based log collection platform
  • Configured Splunk dashboard to view ingestion details.
  • Extensively worked on Jenkins for continuous integration and for End to End automation for all build and deployments.
  • Involved in designing, implementing and administering large (200+ node), highly available Hadoop clusters secured with Kerberos, using the Cloudera Hadoop distribution.
  • Good knowledge in dealing with file formats like Avro and ORC.
  • Carried daily activities like creation and removal of nodes, capacity planning for uploading a file, Cluster monitoring and HDFS support and maintenance.
  • Periodically reviewed the log files created in the Hadoop eco system and was involved in decommissioning and registering free space for the Hadoop cluster.
  • Involved in HA implementation of secondary name node to avoid single point failures along with a team of 7.
  • Installing the Hadoop cluster and extending the Hadoop eco system while integrating it with other systems.
  • Established Spark scripts by expending Scala shell commands as per the prerequisite.
  • Knowledge with multiple Hadoop clusters using Kerberos and Sentry.
  • Skilled with multiple Hadoop distributions like Apache cloudera and Hortonworks.
  • Accessed tools like Cloudera Manager and Ganglia.

Environment: Hadoop, MapReduce, MongoDB, Yarn, Hive, Pig, Base, Kafka, Oozie, Sqoop, Ab Initio, Flume, Core Java, Cloudera, Django, Talend, Impala, Python, HDFS, Eclipse.

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode
  • Created Linux and Python Scripts to automate the daily ingestion of raw data
  • Processed the raw data using Hive jobs and scheduling them in Crontab
  • Developed HIVE UDFs to get the MDK and GeoIp values
  • Moved data to appropriate partition based on record-level timestamp (as we have more than one day's worth f data in log files)
  • Compressed transformed/enriched data files with bzip2Codec
  • Developed the regular expression (RegexSerDe) to in corporate the two different raw data sets
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs
  • Developed Housekeeping process purge the old data from Edge node and HDFS.
  • Implemented two different process for Internal and External Weblogs
  • Manage and review Hadoop log files.
  • Support/Troubleshoot hive programs running on the cluster
  • Involved in fixing issues arising out of duration testing
  • Handling structured, semi structured and unstructured data
  • Automated the History and Purge Process

B NY Mellon, Philadelphia, Pennsylvania

Hadoop Consultant

Responsibilities:

  • Worked on a live Hadoop production CDH3 cluster with 35 nodes
  • Involved in Developing and customizing Map Reduce programs using Java language.
  • Worked on the performance improvement of Map Reduce applications and PIG
  • Scripts.
  • Worked with unstructured and semi structured data of 25 TB in size
  • Designed and developed PIG data transformation scripts to work against
  • Semi structured data from various data points and created a base line.
  • Used Sqoop to import data from DB2 system in to HDFS
  • Experienced in creating Hive scripts for data analysts based on the requirements.
  • Good experience in troubleshooting performance issues of Map Reduce Jobs.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.

Environment: Hadoop, MapReduce, HDFS, Sqoop, Hive, Pig, Ab Initio, Linux, XML, Eclipse, SQL Server, Python, Spark, Oracle, Flume, Cassandra.

Confidential - Phoenix, AZ

Hadoop Consultant

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked in joining raw data with the reference data using Pig scripting.
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.
  • Analyzed data using Hadoop components Hive and Pig.
  • Stack and change extensive arrangements of organized, semi organized, and unstructured information utilizing Hadoop/Big Data ideas.
  • Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Created HBase tables to store variable data formats coming from different portfolios.
  • Implemented HBase features such as compression and used to design, build MapReduce jobs.
  • Worked on compression mechanisms to optimize MapReduce Jobs.
  • Implemented Device based business logic using Hive UDF's to perform ad-hoc queries on structured data.
  • Worked extensively with HIVE DDLs and Hive Query language (HQLs).
  • Implemented dashboards that internally use Hive queries to perform analytics on Structured data, Avro and Json data to meet business requirements.
  • Experienced in handling Avro and Json data in Hive using Hive SerDe's.
  • Worked with HBASE NOSQL database.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager and worked on Oozie workflow to run multiple jobs.

Environment: HDFS, Map Reduce, HBase, HDFS, Hive, Impala Pig, SQL, NOSQL, Cassandra, Cloudera Manager, Sqoop, Flume, Oozie.

Confidential

Java developer

Responsibilities:

  • Used CVS for upholding the Source Code.
  • Executed a project rendering to the Software Development Life Cycle (SDLC).
  • Implemented user interfaces using JSP, Java script and HTML.
  • Instigated JDBC for the mapping of an object-oriented area model to a customary relational database.
  • Produced Stockpiled Procedures to operate the database and to smear the business logic conferring to the user’s stipulations.
  • Involved in database design and evolving SQL Queries for the preservation of procedures on MySQL.
  • Constructed front-end by means of JSP, JSTL, jQuery and AJAX.
  • Engendered tables and maintained relations of these tables by extensively using SQL.
  • An exclusive implementation to handle exceptions by using exception management mechanism comprising of exception handling application blocks.
  • Participated in creating generic Classes, which contains the regularly handled functionality, so that it can be ecological that is reused.

Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, Ajax, Log4j, CSS, XML, Spring, EJB, MDB, WebLogic, REST, Junit, Maven, JIRA, SVN.

We'd love your feedback!