Hadoop/Spark Developer Resume Plano, TX - Hire IT People

SUMMARY

7 years of professional experience in Requirements Analysis, Design, Development and Implementation of Java, J2EE and Big Data technologies.
4+ years of exclusive experience in Big Data technologies andHadoopecosystem components likeSpark, MapReduce, Hive, Pig, YARN, HDFS, Sqoop, Flume, Kafka and NoSQL systems like HBase, Cassandra.
Strong Knowledge on Architecture of Distributed systems and Parallel processing, In - depth understanding of MapReduce Framework andSparkexecution framework.
Expertise in writing end to end Data Processing Jobs to analyze data using MapReduce,Sparkand Hive.
Extensive experience in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
Experience using variousHadoopDistributions (Cloudera, Hortonworks, Amazon AWS) to fully implement and leverage newHadoopfeatures.
Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data.
Extensive experience in importing/exporting data from/to RDBMS theHadoopEcosystem using Apache Sqoop.
Worked on Java HBase API for ingestion processed data to HBase tables
Strong experience in working with UNIX/LINUX environments, writing shell scripts.
Good knowledge and experience of Real time streaming technologiesSparkand Kafka.
Experience in optimization of MapReduce algorithm using Combiners and Practitioners' to deliver the best results.
Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using design patterns.
Sound knowledge of J2EE architecture, design patterns, objects modeling using various J2EE technologies and frameworks.
Adept at creating Unified Modeling Language (UML) diagrams such as Use Case diagrams, Activity diagrams, Class diagrams and Sequence diagrams using Rational Rose and Microsoft Visio.
Extensive experience in developing applications using Java, JSP, Servlets, JavaBeans, JSTL, JSP Custom Tag Libraries, JDBC, JNDI, SQL, AJAX, JavaScript and XML.
Experienced in using Agile methodologies including extreme programming, SCRUM and Test Driven Development (TDD).
Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
Experience in writing test cases in Java Environment using JUnit.
Hands on experience in development of logging standards and mechanism based on Log4j.
Experience in building, deploying and integrating applications with ANT, Maven.
Good knowledge in Web Services, SOAP programming, WSDL, and XML parsers like SAX, DOM, AngularJS, Responsive design/Bootstrap.
Demonstrated technical expertise, organization and client service skills in various projects undertaken.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Zookeeper, YARN, TEZ, Flume, Spark, Kafka

Java&J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI and Java Beans

Databases: Teradata, Oracle11g/10g, MySQL, DB2, SQL Server, NoSQL (Hbase, MongoDB)

Web Technologies: JavaScript, AJAX, HTML, XML and CSS.

Programming Languages: Java, JQuery, Scala, Python, UNIX Shell Scripting

IDE: Eclipse, NetBeans, pyCharms

Integration & Security: MuleSoft, Oracle IDM & OAM, SAML, EDI, EAI

Build Management tools: Maven, Apache ANT, SOAP, REST

Predictive Modelling Tools: SAS Editor, SAS Enterprise guide, SAS Miner, IBM Cognos.

Scheduling Tools: Cron tab, Autosys, Ctrl M

Visualization Tools: Tableau, Arcadia Data.

PROFESSIONAL EXPERIENCE

Confidential, Plano, TX

Hadoop/Spark Developer

Responsibilities:

Expertise in designing and deployment ofHadoopcluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, ZooKeeper, SQOOP, flume,Spark, Impala, Cassandra with Hortonworks Distribution.
InstalledHadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre - processing.
Assisted in upgrading, configuration and maintenance of variousHadoopinfrastructures like Pig, Hive, and Hbase.
UsedSparkAPI over HortonworksHadoopYARN to perform analytics on data in Hive.
Exploring with theSparkimproving the performance and optimization of the existing algorithms inHadoopusingSparkContext,Spark-SQL, Data Frame, Pair RDD's,SparkYARN.
DevelopedSparkcode using scala andSpark-SQL/Streaming for faster testing and processing of data.
Import the data from different sources like HDFS/Hbase intoSparkRDD.
POC on Single Member Debug on Hive/Hbase andSpark.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
Load the data intoSparkRDD and do in memory data Computation to generate the Output response.
Loading Data into Hbase using Bulk Load and Non-bulk load.
Experience in Oozie and workflow scheduler to managehadoopjobs by Direct Acyclic Graph (DAG) of actions with control flows.
Expertise in different data Modeling and Data Warehouse design and development.

Environment: Hadoop, HDFS, MapReduce, Pig, Hive, Sqoop, Kfaka, Solr, HBase, Oozie, Flume,Spark- Streaming/SQL, java, SQL Scripting, Linux Shell Scripting.

Confidential, Phoenix, AZ

HadoopDeveloper

Responsibilities:

Installed and configuredHadoopEnvironment.
Developed multiple Map - Reduce jobs in java for data cleaning and preprocessing.
Installed and configured Pig and also written Pig Latin scripts.
Used pig and map reduce to analyze XML files and log files.
Imported data using Sqoop to load data from IBM DB2 to HDFS on regular basis.
Written Hive queries for data analysis to meet the business requirements.
Creating Hive tables and working on them using Hive QL.
Importing and exporting data into HDFS and Hive using Sqoop from IBM DB2, Netezza Databases.
Used Oozie workflow to co-ordinate pig and hive scripts.
Used Impala for querying HDFS data to achieve better performance.
Designed and implemented Map-Reduce based large-scale parallel relation-learning system.
Setup and benchmarkedHadoop/Hbase clusters for internal use.
Developed UDF's to pre-process the data and compute various metrics for reporting in both pig and hive.
Developed Map Reduce program to convert mainframe fixed length data to delimited data.
Data ingestion from various IBM DB2 tables to HDFS using Sqoop.
Automated Python scripts to pull and synchronize the code in GitHub environment.

Environment: Hadoop, CDH, Map Reduce, HDFS, Pig, Hive, Oozie, Java, UNIX, Flume, Impala, Hbase, Oracle, Map R AutoSys, Mainframes, JCL, IBM DB2, NDM.

Confidential, Columbus, OH

Java/HadoopDeveloper

Responsibilities:

Responsible for business logic using java and JavaScript, JDBC for querying database.
Involved in requirement analysis, design, coding and implementation.
Worked in Agile Methodology and used JIRA for maintain the stories about project.
Analyzed large data sets by running Hive queries.
Involved in Design, develop Hive Data model, loading with data and writing Java UDF for Hive.
Handled importing and exporting data into HDFS by developing solutions, analyzed the data using Map Reduce, Hive and produce summary results fromHadoopto downstream systems.
Used Sqoop to import and export the data fromHadoopDistributed File System (HDFS) to RDBMS.
Created Hive tables and loaded data from HDFS to Hive tables as per the requirement.
Established custom Map Reduces programs in order to analyze data and used HQL queries to clean unwanted data.
Created components like Hive UDFs for missing functionality in Hive to analyze and process the large volumes of data.
Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
Involved in writing complex queries to perform join operations between multiple tables.
Involved actively verifying and testing data in HDFS and Hive tables while Sqooping data from Hive to RDBMS tables.
Developing Scripts and Scheduled Autosy's Jobs to filter the data.
Involved monitoring Auto Sys's file watcher jobs and testing data for each transaction and verified data weather it ran properly or not.
Created and maintained Technical documentation for launchingHadoopClusters and for executing Hive queries and Pig Scripts
Used IMPALA to pull the data from Hive tables.
Used Apache Maven 3.x to build and deploy application to various environments Installed Oozie workflow engine to run multiple Hive jobs which run independently with time and data availabilities

Environment: HDFS,Hadoop, Pig, Hive, Sqoop, Flume, Map Reduce, Oozie, Mongo DB, Java 6/7, Oracle 10g, Sub Version, Toad, UNIX Shell Scripting, SOAP, REST services, Oracle 10g, Agile Methodology, JIRA, Auto Sys

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Plano, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship