Hadoop/ Spark Developer Resume
Carmel, IN
SUMMARY
- Around 6+ years of professional experience in Information Technology includes 3+ years in Big Data and Hadoop Ecosystem related technologies.
- Experience in working wif BI team and transform big data requirements into Hadoop centric technologies.
- Excellent understanding / knowledge of Hadoop architecture and various components such as Big Data and Hadoop File System HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Hadoop Map Reduce programming paradigm.
- Extensive experience in both MapReduce MRv1 and MapReduce MRv2 (YARN).
- Expertise in Hadoop, MapReduce, Spark - Scala, YARN, Spark Stream, Hive, Pig, HBase, Kafka, Cassandra, Oracle.
- Hands on experience on Hadoop ecosystem- HDFS, Map Reduce, Hive, Pig, Oozie, Flume and Zookeeper.
- Writing ETL procedures to move large data sets from legacy systems into teh new system schema using Oracle SQL Developer.
- Creating ad-hoc reports usingSQLwrittenSQLDeveloper.
- Hands on experience wif major components in Hadoop Ecosystem including Flume, Kafka, Oozie, Zookeeper and MapReduce frameworks, Cassandra.
- Expertise in Designing and developing a distributed processing system running into a Data Warehousing platform for reporting.
- Performed importing and exporting data into HDFS and Hive using Sqoop.
- Experience in loading streaming log data from various web servers into HDFS using Flume.
- Performed data analytics using PIG and Hive for Data Architects and Data Scientists wifin teh team.
- Expertise in writing Map Reduce Programs and UDFs for both Hive and Pig in JAVA.
- Expertise in job workflow scheduling and monitoring tools like Oozie, external schedulers like Autosys, cronjobs.
- Worked on NOSQL database like HBase and Cassandra.
- Experience in transferring data between HDFS and Relational Database wif Sqoop
- Experience in writing numeroustestcases using JUnit framework.
- Strong Knowledge on full Software Development life cycle-Software analysis, design, architecture, development and maintenance.
- Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
- Excellent Analytical, problem solving and communication skills wif teh ability to work as a part of teh team as well as independently.
- Good understanding of distributed systems and parallel processing architectures.
TECHNICAL SKILLS
Programming Languages: Java, C, SQL, HQL, Scala, Pig Latin.
Big Data Technologies: HDFS, Hive, Impala, Map Reduce, Pig, Sqoop, Oozie, kafka, Zookeeper, YARN, Avro, Spark.
Front End Technologies: HTML, XHTML, CSS, XML, JavaScript.
Java Frameworks: MVC, Apache Struts2.0, Spring and Hibernate.
Build and Deployment(CICD): Apache Maven, Jenkins, Github, SVN, Nexus, Puppet
Databases: Oracle 11g, MySQL, MS SQL Server, Teradata.
NoSQL Databases: HBase, Casaandra, MongoDB.
IDE: Eclipse, Netbeans, JBuilder.
RDBMS: MS Access, MS SQL Server, MySQL, IBM DB2, PL/SQL.
Operating Systems: Linux, Windows, Mac
Networks: HTTP, HTTPS, FTP, UDP, TCP/TP, SNMP.
PROFESSIONAL EXPERIENCE
Confidential, Carmel, IN
Hadoop/ Spark Developer
Responsibilities:
- Involved in teh abnormal state outline of teh Hadoop 2.6.3 engineering for teh current information structure and Problem articulation and setup another group and arranged teh whole Hadoop stage.
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented Data Interface to get data of clients utilizing Rest API and Pre-Process information utilizing MapReduce 2.0 and store into HDFS (Hortonworks).
- Extracted documents from MySQL, Oracle, and Teradata through Sqoop 1.4.6 and set in HDFS Cloudera Distribution and handled.
- Worked wif different HDFS document groups like Avro 1.7.6, Sequence File, Json and different pressure positions like Snappy, bzip2.
- Developed effective MapReduce programs for sifting through teh unstructured information and built up numerous MapReduce occupations to perform information cleaning and preprocessing on Cloudera.
- Involved in changing over MapReduce employments into start changes in Spark RDD's utilizing Python.
- Continuous checking and dealing wif teh Hadoop group utilizing Ambari.
- Used Pig to perform information approval on teh information ingested utilizing Sqoop and Flume and teh washed down informational index is pushed into Hive.
- Collecting and conglomerating alot of log information utilizing Apache Spark and arranging information in HDFS for assist investigation.
- Designed and constructed teh Reporting Application, which utilizes teh Spark SQL to get and create writes about HBase table information.
- Developed custom Unix SHELL contents to do pre-post approvals of ace and slave hubs, when arranging teh name hub and information hubs separately.
- Developed information pipeline utilizing Kafka to ingest behavioral information, utilized Spark Streaming for information
- Configured a 7 hubs Kafka platform wif 2 Web servers, 3 Kafka dealers and 2 Kafka shoppers Spark Streaming (Data Frames) wif 2 Zookeeper hubs, where Kafka agents ready to maintain 1 million composes (message) every second.
- Develop Pentaho Kettle Graphs to wash down and change teh crude information into valuable data and load it to Kafka Queue's and further stacked to HDFS, Neo4j database for UI group to show it utilizing teh web application.
- Automated teh procedure for extraction of teh information from distribution centers and weblogs into Hive tables by creating work processes and facilitator employments in Oozie.
- Developed little circulated applications in our tasks utilizing Zookeeper 3.4.7 and planned teh work processes utilizing Oozie 4.2.0.
Environment: Hadoop, HDP, Hive, Oozie, Hortonworks Sandbox, Java, Eclipse LUNA, Zookeeper, JSON file format, Spark.
Confidential, Dallas, TX
Hadoop / Admin Developer
Responsibilities:
- Imported information utilizing Sqoop to stack information from MySQL to HDFS on standard premise.
- Involved in gathering and collecting alot of log information utilizing Apache Flume and organizing information in HDFS for assist investigation.
- Push information from Amazon S3 stockpiling to Redshift utilizing Key, Value combines as required by BI group.
- Processed information utilizing Athena on S3 took a shot at door hubs and connectors (Jar records) interfacing sources wif AWS cloud.
- Developed proficient MapReduce programs for sifting through teh unstructured information and built up different MapReduce occupations to perform information cleaning and preprocessing on EMR.
- Collected and amassed alot of web log information from various sources, for example, webservers, portable and system gadgets utilizing Apache Kafka and put away teh information into HDFS for investigation.
- Developed various Kafka Producers and Consumers sans preparation actualizing according to association's necessities.
- Setup Flume for various sources to convey teh log messages from outside to Hadoop HDFS.
- Responsible for making, changing themes (Kafka Queues) as and when required wif differing setups including replication components, parcels and TTL.
- Wrote and tried complex MapReduce occupations for totaling distinguished and approved information.
- Created Managed and External Hive tables wif static/dynamic partitioning.
- Written Hive questions for information examination to meet teh Business prerequisites.
- Increased execution of teh HiveQLs by part bigger questions into little and by presenting impermanent tables in teh middle of them.
- Extensively engaged wif execution tuning of teh HiveQL by performing bucketing on vast Hive tables
- Used open source web scratching structure for python to slither and extricate information from site pages.
- Optimized teh Hive questions by setting distinctive mixes of Hive parameters.
- Developed UDF's(User Defined Functions)to expand center usefulness of PIG and HIVE questions according to necessity.
- Extensive involvement in composing Pig contents to change crude information from a few information sources into framing standard information.
- Implemented work process utilizing Oozie for running Map Reduce employments and Hive Queries.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, Oracle 11g, Core Java, Cloudera HDFS, Eclipse.
Confidential, Houston, TX
Big Data Developer
Responsibilities:
- Developed Java web benefits as a major aspect of utilitarian necessities.
- Installed and arranged Hadoop and in charge of keeping up bunch and overseeing and evaluating Hadoop log documents.
- Supported in setting up QA condition and refreshing designs for executing contents wif Pig and Sqoop.
- Developed MapReduce programs in Java for Data Analysis and stacked information from different information sources into HDFS.
- Worked widely on Cloudera to examine information introduce over HDFS utilizing Hive and Pig.
- Created Pig Latin contents to sort, gathering, join and channel teh endeavor astute information.
- Worked on expansive arrangements of organized, semi-organized and unstructured information.
- Use of Sqoop to import and fare information from Oracle RDBMS to HDFS and teh other way around.
- Involved in making Hive tables, stacking wif information and composing hive questions which will run inside as MapReduce occupations.
- Coordinated wif business customers to gather business requirements.
Environment: Hadoop, Hive, Map Reduce, Pig, SQOOP, MYSQL, Hbase, Flume, Spark, Scala, Hortenworks- Sandbox.
Confidential, AL Khobar, SA
Data Engineer
Responsibilities:
- Worked wif a group to assemble and investigate teh customer necessities.
- Analyzed vast informational collections appropriated crosswise over group of product equipment.
- Connecting to Hadoop cluster and Cassandra ring and executing test programs on servers.
- Hadoop and Cassandra as a major aspect of Next age stage execution.
- Developed a few progressed MapReduce projects to process got information documents.
- Handled bringing in of information from different information sources, performed changes utilizing Hive, MapReduce, stacked information into HDFS and Extracted teh information from Oracle into HDFS utilizing Sqoop.
- Load teh OLTP models and Perform ETL to stack Dimension information for a Star Schema.
- Built-in Request manufacturer, created in Scala to encourage running of situations, utilizing JSON arrangement records.
- Analyzed teh information by performing Hive inquiries and running Pig contents to think about client conduct.
- Developed work process in Oozie to mechanize teh errands of stacking teh information into HDFS and pre-preparing wif Pig.
- Developed teh Pig UDF's to pre-process teh information for examination and Migrated ETL tasks into Hadoop framework utilizing Pig Latin contents and Python Scripts 3.6.2.
- Used Pig as ETL apparatus to do changes, occasion joins, sifting and some pre-accumulations before putting away teh information into HDFS.
- Worked on making MapReduce projects to parse teh information for guarantee report age and running teh Jars in Hadoop. Facilitated wif Java group in making MapReduce programs.
- Implemented teh venture by utilizing Spring Web MVC module.
- Responsible for overseeing and checking on Hadoop log records. Planned and created information administration framework utilizing MySQL.
- Worked wif teh application owners to understand teh business requirements and mapped into technical requirements.
Environment: T-SQL, MSSQLServer 2014/2012, Visual Studio (2012), BIDS SSIS, SSRS, Autosys, Team Foundation Server (TFS), Version One, SharePoint.
Confidential
Java Developer
Responsibilities:
- Developed technical design documents and create a prototype of teh critical business application using JAVA/J2EE Initiated use of Http Unit, Selenium IDE for testing.
- Design, develop, and implement rich user interfaces for complex web based systems using teh frameworks like JSF.
- Also worked on Simple Network Management Protocol (SNMP) is an Internet standard protocol for managing devices on IP networks". Devices that typically support SNMP include routers, switches, servers, workstations, printers, modem racks and more.
- Also worked on network management stations.(NMS)
- Analyzed and identified components for teh Presentation, Business, Integration, Resource and Service Layers.
- Developing an Administration Portal using HTML5, node JS, JQuery, Java Script Frameworks like BackBone JS and requires.
- Working wif process owners and business stakeholders to translate business requirements into functional requirements wifin Service Now.
- Generated server side PL/SQL scripts for data manipulation and validation and materialized views for remote instances.
- Working on GUI tool like Kibana to view generated logs and other tools like log stash, Elastic Search for log management.
- Worked on Distribution Engine components for Comcast agent application portal wif Elastic search as DB.
- Developed technical specifications for various back end modules from business requirements and specifications are done according to standard specification formats.
- Developed BPEL Process to Transfer Vendors, Customers, Items to make teh data in sync between Oracle EBS and WM using hierarchical queries.
- Designed and developed DAO layer wif Hibernate3.0 standards, to access data from IBM DB2 database through JPA(Java Persistence API) layer creating Object-Relational Mappings and writing PL/SQL procedures and functions.
- Integrating Spring injections for DAOs to achieve Inversion of Control, updating Spring Configurations for managing Java objects using callbacks.
- Designed & coded Presentation (GUI) JSP's wif Struts tag libraries for Creating Product Service Components (Health Care Codes) using RAD.
- Coded Action classes, Java Beans, Service layers, Business delegates, to implement business logic wif latest features of JDK1.5 such as Annotations and Generics.
- Creating test environments wif WAS for local testing using test profile. And interacting wif Software Quality Assurance (SQA) end to report and fix defects using Rational Clear Quest.
Environment: Java, J2EE, JSP, Hibernate, Struts, XML Schema, SOAP, Java Script, PL/SQL, Junit, AJAX, HQL, JSP, HTML, JDBC, Maven, Eclipse