Big Data / Hadoop Lead Resume Dallas, TX - Hire IT People

SUMMARY:

Over all 8+ years of experience in data analysis, data modeling and implementation of enterprise class systems spanning Big Data, Data Integration, Object Oriented programming, Data warehousing and Advanced Analytics
4 years of experience with Hadoop, HDFS, Map Reduce and Hadoop Ecosystem (Hive, Hive,Oozie, Kafka, Impala & Spark, AVRO, JSON).
Good knowledge of Hive optimization with ORC, Partitions and Bucketing.
Data ingestion schedulers have been created using Sqoop and Oozie scheduler.
Have hands on experience in writing MapReduce jobs using Java.
Hands on experience in writing pig Latin scripts and pig commands and hive queries.
Having good knowledge and experience in Spark and Kafka.
Hands on experience in installing, configuring and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Pig, Scala,Hive, Impala & Spark
Experience in database development using SQL and PL/SQL and experience working on databases like Oracle 9i/10g, Informix, and SQL Server.
Having handful experience on AWS (EC2, Redshift,Cloud Watch, Route 53, EMR, Cloud front, S3, IAM)
Experience working on NoSQL databases including HBase & MongoDB.
Experience using Sqoopto import data into HDFS from RDBMS and vice - versa.
Experience in Database Design and Development using Relational Databases (Oracle, MS-SQL, MySQL Server 2005/2008) and NoSQL Databases (MongoDB, Cassandra, HBase)
Effective team player and excellent communication skills with insight to determine priorities, schedule work and meet critical deadlines.
Having good work experience in file formats such as AVRO, JSON, and Parquet etc. with Hadoop tools using SerDe concepts
Experience in analyzing data in Spark using Scala and Pyspark.
Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
Experience in importing streaming data into HDFS using flume sources and flume sinks and transforming the data using flume interceptors
Experience utilizing Java tools in Business, Web, and Client-Server environments including Java, Jdbc, Servlets, Jsp, Struts Framework, Jasper Reports and Sql.
Proficient in using various IDEs like Eclipse, Net beans.
Experienced with different scripting languages like Python and shell scripting.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, Map Reduce, HDFS, HBase, Spark, Scala, Impala, Hive, Pig, Oozie,sqoop, Flume, Kafka, CDH4, JSON, AVRO

Java Technologies: Java 5,Java 6, JAXP, AJAX, I18N, JFC Swing, Log4j, Java Help API

Methodologies: Agile, UML, Design Patterns

Database: Oracle 10g, DB2,MySQL, No Sql (MongoDB),Hbase, Cassandra

Cloud: AWS (EC2, Redshift,Cloud Watch, Route 53, EMR, Cloud front, S3, IAM)

Application Server: Apache Tomcat 5.x 6.0, Jboss 4.0

Web Tools: HTML, Java Script, XML, DTD, Schemas, XSL, XSLT, XPath, DOM, XQuery

Tools: SQL developer, DB visualize, Hortonworks

IDE / Testing Tools: NetBeans, Eclipse, WSAD, RAD, Mat lab

Operating System: Windows. Linux

Scripts: Bash, Python, ANT

Testing API: JUNIT

PROFESSIONAL EXPERIENCE:

Confidential, Dallas, TX

Big Data / Hadoop Lead

Responsibilities:

Involved in the project from POC and worked from data staging till saturation of DataMart and reporting.
Worked in an onsite-offshore environment.
Completely responsible for creating data model for storing & processing data and for generating & reporting alerts. This model is being implemented as standard across all regions as a global solution.
Involved in discussions and guiding other region teams on Citi Big data platform and AML cards data model and strategy.
Responsible for technical design and review of data dictionary (Business requirement).
Responsible for providing technical solutions and work arounds.
Migrating the needed data from Data warehouse and Product processors into HDFS using Talend and Sqoop and importing various formats of flat files in to HDFS.
Analysis and development of Spark Cassandra connector to load data from flat file to Cassandra.
Using Spark Streaming to bring all credit card transactions in the Hadoop environment.
Involved in design of overall Citi Group Big data architecture.
Involved in discussion with source systems for issues related to DQ in data.
Integrated the hive warehouse with Spark & Impala. We replaced impala with spark due to impala’s security issue.
Comfortable with SCALA functional programming idioms and very familiar with Iterate / Enumerate streaming patterns. Almost entire DQ and end to end reconciliation is done in SCALA & SPARK.
Implemented partitioning, dynamic partitions, indexing and buckets HIVE.
Created Custom UDF’s in JAVA to overcome HIVE limitations on cloudera CDH5.
Used Hive to process data and Batch data filtering. Used Spark/Impala for any other value centric data filtering.
Supported and Monitored Map Reduce Programs running on the cluster.
Monitored logs and responded accordingly to any warning or failure conditions.
Responsible for preserving code and design integrity using SVN and SharePoint.
Gave a demo to business users on using Datameer for analytics.

Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Talend, Spark, Cassandra, Impala, Scala, Sqoop, Cloudera CDH5, Platform, SVN, SharePoint, Data Meer and Maven.

Confidential, New York city, NY

Big Data / Hadoop Developer

Responsibilities:

Experienced developing in templates and screens in HTML and JavaScript.
Used struts validation framework for form level validation.
Wrote test cases in JUnit for unit testing of classes.
Worked on Spring to develop different modules to assist the product in handling different requirements.
Implemented CDH3 Hadoop cluster on CentOS.
Implemented POC's to configure data tax Cassandra with Hadoop.
Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/Ubuntu) and Configuring launched instances with respect to specific applications.
Launching and Setup of HADOOP Cluster which includes configuring different components of HADOOP.
Hands on experience in loading data from UNIX file system to HDFS.
Experienced with Performing Cassandra Queryoperations using Thrift API to perform real time analytics.
Cluster coordination services through Zookeeper.
Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
Involved in creating Hive tables, loading data and running hive queries in those data.
Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
Involved in writing optimized Pig Script along with involved in developing and testing Pig Latin Scripts.
Working knowledge in writing Pig's Load and Store functions.

Environment: Apache Hadoop 1.0.1, MapReduce, HDFS, CentOS, Zookeeper, Sqoop, Cassandra, Hive, Pig, Oozie, Java, Eclipse, Amazon EC2, JSP, Servlets.

Confidential, Somerset, NJ

Java/ J2EE Developer

Responsibilities:

Coded the business methods according to the IBM Rational Rose UML model.
Extensively used Core Java, Servlets, JSP and XML.
Used Struts 1.2 in presentation tier.
Generated the Hibernate XML and Java Mappings for the schemas
Used DB2 Database to store the system data
Used Rational Application Developer (RAD) as Integrated Development Environment (IDE).
Used unit testing for all the components using JUnit.
Used Apache log 4j Logging framework for logging of trace and Auditing.
Used Asynchronous JavaScript and XML (AJAX) for better and faster interactive Front-End.
Used IBM Web-Sphere as the Application Server.
Used IBM Rational Clearcase as the version controller.

Environment: s: Java 1.6, Servlets, JSP, Struts1.2, IBM Rational Application Developer (RAD) 6, Web sphere 6.0, iText, AJAX, Rational Clear case, Rational Rose, Oracle 9i, log4j.

Confidential

JAVA Developer

Responsibilities:

Involved in the complete SDLC software development life cycle of the application from requirement analysis to testing.
Developed the modules based on struts MVC Architecture.
Developed The UI using JavaScript, JSP, HTML, and CSS for interactive cross browser functionality and complex user interface.
Created Business Logic using Servlets, Session beans and deployed them on WebLogic server.
Used MVC struts framework for application design.
Created complex SQL Queries, PL/SQL Stored procedures, Functions for back end.
Prepared the Functional, Design and Test case specifications.
Involved in writing Stored Procedures in Oracle to do some database side validations.
Performed unit testing, system testing and integration testing
Developed Unit Test Cases. Used JUnit for unit testing of the application.
Provided Technical support for production environments resolving the issues, analyzing the defects, providing and implementing the solution defects. Resolved more priority defects as per the schedule.
Used Eclipse IDE for all coding in Java, Servlets and JSPs.
Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing, responsible for defects allocation and ensuring that the defects are resolved.
Used Flex Styles and CSS to manage the Look and Feel of the application.
Deployed the application on Web Sphere Application server.

Environment: Java 6, Eclipse, Apache Tomcat Web Server, JSP, JavaScript, AWT, Servlets, JDBC, HTML, Front Page 2000, Oracle, CVS.

We provide IT Staff Augmentation Services!

Big Data / Hadoop Lead Resume

Dallas, TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship