Big Data / Hadoop Developer Resume Dallas, TX - Hire IT People

SUMMARY

6 years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, NoSQL and Java/J2EE technologies
3+ years of hands on experience with Big Data Ecosystems including Hadoop (1.0 and YARN), Tableau, MapReduce, Pig, Hive, Impala, Sqoop, Flume, Oozie, MongoDB, Zookeeper, Kafka, Maven, Spark, Scala, HBase, Cassandra(CQL)
Experience in installation, configuration and deployment of Big Data solutions
Excellent knowledge on Hadoop Ecosystem Architecture and components such as Hadoop Distributes File System (HDFS), MRv1, MRv2, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming
Experience in analyzing the data using Hive UDF and Hive UDTF custom Map Reduce programs in Java
Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle and MySQL
Worked on Agile/SCRUM software development .
Responsible for deploying the scripts into Github version control repository hosting service and deployed the code using Jenkins.
Proficient in configuring Active Audit framework before ingesting files into HDFS by enabling filename check, record count check, file size check, duplicate check, missing file check and zero byte check. Enabled the Passive audit check after ingesting data into external hive tables by matching the count between the source file and hive table count.
Primarily responsible for designing, implementing, Testing, and maintaining database solution forAzure.
Primarily involved in Data Migration process usingAzure by integrating with Github repository and Jenkins.
Hands on experience with Real time streaming using Kafka, Spark streaming into HDFS
Implemented pre - defined operators in Spark such as map, reduce, sample, filter, count, cogroup, groupBy, sort, reduce By Key, take, group By Key, union, left Outer Join, right Outer Join, and etc.
Developed analytical components using SparkSql andSparkStream.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL using Scala
Deeply involved in writing complex Spark-Scala scripts, written udf's, Spark context, Cassandra sql context, used multiple API's, methods which support data frames, RDD's, data frame Joins, Cassandra table joins and finally write/save the data frames/RDD's to Cassandra database.
Proficient in Java, Collections, J2EE, Servlets, JSP, Spring, Hibernate, JDBC/ODBC

TECHNICAL SKILLS

Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Zena. Zeke Scheduling, Zookeeper, Flume, Kafka, Spark core, Sparksql, Spark streaming, AWS, Azure

NoSQL Databases: Hbase, Cassandra, MongoDB

Build Management Tools: Maven, Apache Ant

Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans

Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting

Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MR-Unit

Version control: Github, Jenkins

Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x

PROFESSIONAL EXPERIENCE

Confidential, Dallas, TX

Big Data / Hadoop Developer

Responsibilities:

Created a Zeke event in FTP Process to trigger on end of mainframe JCL job for Stoploss project in which the Zeke event triggers the datalake zena ingestion process.
Involved in creating Java Script to enable the Process variable for trigger to consumption and enabled the date timestamp partition.
Responsible in configuring Active Audit framework before ingesting files into HDFS by enabling filename check, record count check, file size check, duplicate check, missing file check and zero byte check. The Passive audit check is enabled after ingesting data into external hive tables by matching the count between the source file and hive tablecount.
Ingested the contract, commission, CVS claims historical files one time load into the incoming raw layer in HDFS file system and scheduled the incremental data in Zena scheduler by date timestamp partition.
Involved in adding the data to the new partition in hive external staging table to read data from partition and loaded the external Hive ORC tables with Snappy compression using Pig HCatalog scripts.
Applied several business rules as per the requirement in the data transformations and made data available to the downstream consumption teams.
Worked on Walgreens Member search project with tight time lines and configured the ingestion process by applying the business requirements in data transformations by eliminating header data from control files and exported the processed data from HDFS smith outgoing layer to ADW.
Worked on Jira Scrum software development for issue tracking and release management.
Responsible for the moving the ingestion scripts into Github version control repository hosting service and deployed the script using Jenkins.
Primarily involved in Data Migration process usingAzure by integrating with Github repository and Jenkins.

Environment: Hadoop, HDFS, Pig, Hive, Java, Sqoop, HBase, Zena Scheduler, Jira, Github, Jenkins, Azure

Confidential, Dallas, TX

Big Data / Hadoop Developer

Responsibilities:

Configured Flume and Kafka to capture the data from various sources such as Clickstream data and twitter feeds
Involved in data ingestion from relational databases into HDFS using Sqoop
Data cleansing and data enrichment is done using Pig Latin and HiveQL
Build exception files for all non compliant data using Pig
Responsible for managing data from various sources
Created Hive External table for Semantic data and loaded the data into tables and query data using HQL
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Worked with different data sources like Avro data files, XML files, Json files, SQL server and Oracle to load data into Hive tables
Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files
Business Metrics are build as part of target platform using HiveQL
Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector

Environment: Hadoop, HDFS, Pig, Hive, Java, Sqoop, Kafka, HBase, noSQL, Oracle 10g, PL/SQL, SQL Server, Windows NT, Tableau.

Confidential, Dallas, TX

Big Data / Hadoop Developer

Responsibilities:

Created a port for live streaming and data is taken by streaming context
Used Maven as a deployment tool for Spark submit and generated a jar file with a sliding window interval of 5 secs
UsedSparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
The generated output is stored, used for creating Spark Data Frames for further analysis

Environment: Spark streaming, Scala, MavenWeb log Streaming using Spark

Confidential, Dallas, TX

Big Data / Hadoop Developer

Responsibilities:

Used Flume agent for stimulating Confidential log files as source and sink as Sparksink
Generated live streaming with a sliding window interval of 10 secs
The Custom Scala function is added to the source program for multiple operations
The generated output is transformed to Spark Data Frames/ RDD's and connected to Cassandra database

Environment: Flume, Spark, Scala, Maven, Cassandra

Confidential

Hadoop developer

Responsibilities:

Worked on analyzing Hadoop stack and different big data tools including Pig and Hive, Hbase database and Sqoop
Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop
Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS
Designed and Develop user defined functions to provide custom HIVE and PIG capabilities cross the application teams
Created Hive External tables and loaded the data into tables and query data using HQL
Collected the logs data from web servers and integrated into HDFS using Flume
Worked on Impala for exposing data for further analysis and for generating transforming files from different analytical formats to text files
Implemented test scripts to support test driven development and continuous integration
Worked on tuning the performance of HIVE and PIG queries
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
Worked on Agile/SCRUM software development .

Environment: HDFS, Java, MapReduce, Pig, Hive, Impala, Hbase, Oozie, Sqoop, Flume, Linux.

Confidential

Java Developer

Responsibilities:

Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming
Worked on creating EJBs that implemented business logic
Developed the presentation layer using JSP, HTML, CSS and client validations using JavaScript
Involved in designing and development of the ecommerce site using JSP, Servlet, EJBs, JavaScript and JDBC
Used Eclipse 6.0 as IDE for application development
Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer
Configured Struts framework to implement MVC design patterns
Designed and developed GUI using JSP, HTML, DHTML and CSS
Worked with JMS for messaging interface

Environment: Java, J2EE, HTML, DHTML, JSP, Servlets, XML, EJB, Sturts, GIT, Weblogic 8.1, SQL Server 2008R2, CentOS, UNIX, Linux, Windows 7/Vista/XP

We provide IT Staff Augmentation Services!

Big Data / Hadoop Developer Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship