Big Data / Hadoop Developer Resume
Dallas, TX
SUMMARY
- 6 years of IT experience in software analysis, design, development, testing and implementation of Big Data, Hadoop, NoSQL and Java/J2EE technologies
- 3+ years of hands on experience with Big Data Ecosystems including Hadoop (1.0 and YARN), Tableau, MapReduce, Pig, Hive, Impala, Sqoop, Flume, Oozie, MongoDB, Zookeeper, Kafka, Maven, Spark, Scala, HBase, Cassandra(CQL)
- Experience in installation, configuration and deployment of Big Data solutions
- Excellent knowledge on Hadoop Ecosystem Architecture and components such as Hadoop Distributes File System (HDFS), MRv1, MRv2, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and MapReduce programming
- Experience in analyzing the data using Hive UDF and Hive UDTF custom Map Reduce programs in Java
- Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
- Hands on experience with NoSQL Databases like HBase, Cassandra and relational databases like Oracle and MySQL
- Worked on Agile/SCRUM software development .
- Responsible for deploying the scripts into Github version control repository hosting service and deployed the code using Jenkins.
- Proficient in configuring Active Audit framework before ingesting files into HDFS by enabling filename check, record count check, file size check, duplicate check, missing file check and zero byte check. Enabled the Passive audit check after ingesting data into external hive tables by matching the count between the source file and hive table count.
- Primarily responsible for designing, implementing, Testing, and maintaining database solution forAzure.
- Primarily involved in Data Migration process usingAzure by integrating with Github repository and Jenkins.
- Hands on experience with Real time streaming using Kafka, Spark streaming into HDFS
- Implemented pre - defined operators in Spark such as map, reduce, sample, filter, count, cogroup, groupBy, sort, reduce By Key, take, group By Key, union, left Outer Join, right Outer Join, and etc.
- Developed analytical components using SparkSql andSparkStream.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL using Scala
- Deeply involved in writing complex Spark-Scala scripts, written udf's, Spark context, Cassandra sql context, used multiple API's, methods which support data frames, RDD's, data frame Joins, Cassandra table joins and finally write/save the data frames/RDD's to Cassandra database.
- Proficient in Java, Collections, J2EE, Servlets, JSP, Spring, Hibernate, JDBC/ODBC
TECHNICAL SKILLS
Hadoop Ecosystem: Hadoop, HDFS, MapReduce, Hive, Impala, Pig, Sqoop, Oozie, Zena. Zeke Scheduling, Zookeeper, Flume, Kafka, Spark core, Sparksql, Spark streaming, AWS, Azure
NoSQL Databases: Hbase, Cassandra, MongoDB
Build Management Tools: Maven, Apache Ant
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI, Java Beans
Languages: C, C++, JAVA, SQL, PL/SQL, PIG Latin, HiveQL, UNIX shell scripting
Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MR-Unit
Version control: Github, Jenkins
Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x
PROFESSIONAL EXPERIENCE
Confidential, Dallas, TX
Big Data / Hadoop Developer
Responsibilities:
- Created a Zeke event in FTP Process to trigger on end of mainframe JCL job for Stoploss project in which the Zeke event triggers the datalake zena ingestion process.
- Involved in creating Java Script to enable the Process variable for trigger to consumption and enabled the date timestamp partition.
- Responsible in configuring Active Audit framework before ingesting files into HDFS by enabling filename check, record count check, file size check, duplicate check, missing file check and zero byte check. The Passive audit check is enabled after ingesting data into external hive tables by matching the count between the source file and hive tablecount.
- Ingested the contract, commission, CVS claims historical files one time load into the incoming raw layer in HDFS file system and scheduled the incremental data in Zena scheduler by date timestamp partition.
- Involved in adding the data to the new partition in hive external staging table to read data from partition and loaded the external Hive ORC tables with Snappy compression using Pig HCatalog scripts.
- Applied several business rules as per the requirement in the data transformations and made data available to the downstream consumption teams.
- Worked on Walgreens Member search project with tight time lines and configured the ingestion process by applying the business requirements in data transformations by eliminating header data from control files and exported the processed data from HDFS smith outgoing layer to ADW.
- Worked on Jira Scrum software development for issue tracking and release management.
- Responsible for the moving the ingestion scripts into Github version control repository hosting service and deployed the script using Jenkins.
- Primarily involved in Data Migration process usingAzure by integrating with Github repository and Jenkins.
Environment: Hadoop, HDFS, Pig, Hive, Java, Sqoop, HBase, Zena Scheduler, Jira, Github, Jenkins, Azure
Confidential, Dallas, TX
Big Data / Hadoop Developer
Responsibilities:
- Configured Flume and Kafka to capture the data from various sources such as Clickstream data and twitter feeds
- Involved in data ingestion from relational databases into HDFS using Sqoop
- Data cleansing and data enrichment is done using Pig Latin and HiveQL
- Build exception files for all non compliant data using Pig
- Responsible for managing data from various sources
- Created Hive External table for Semantic data and loaded the data into tables and query data using HQL
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Worked with different data sources like Avro data files, XML files, Json files, SQL server and Oracle to load data into Hive tables
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files
- Business Metrics are build as part of target platform using HiveQL
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector
Environment: Hadoop, HDFS, Pig, Hive, Java, Sqoop, Kafka, HBase, noSQL, Oracle 10g, PL/SQL, SQL Server, Windows NT, Tableau.
Confidential, Dallas, TX
Big Data / Hadoop Developer
Responsibilities:
- Created a port for live streaming and data is taken by streaming context
- Used Maven as a deployment tool for Spark submit and generated a jar file with a sliding window interval of 5 secs
- UsedSparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
- The generated output is stored, used for creating Spark Data Frames for further analysis
Environment: Spark streaming, Scala, MavenWeb log Streaming using Spark
Confidential, Dallas, TX
Big Data / Hadoop Developer
Responsibilities:
- Used Flume agent for stimulating Confidential log files as source and sink as Sparksink
- Generated live streaming with a sliding window interval of 10 secs
- The Custom Scala function is added to the source program for multiple operations
- The generated output is transformed to Spark Data Frames/ RDD's and connected to Cassandra database
Environment: Flume, Spark, Scala, Maven, Cassandra
Confidential
Hadoop developer
Responsibilities:
- Worked on analyzing Hadoop stack and different big data tools including Pig and Hive, Hbase database and Sqoop
- Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS
- Designed and Develop user defined functions to provide custom HIVE and PIG capabilities cross the application teams
- Created Hive External tables and loaded the data into tables and query data using HQL
- Collected the logs data from web servers and integrated into HDFS using Flume
- Worked on Impala for exposing data for further analysis and for generating transforming files from different analytical formats to text files
- Implemented test scripts to support test driven development and continuous integration
- Worked on tuning the performance of HIVE and PIG queries
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
- Worked on Agile/SCRUM software development .
Environment: HDFS, Java, MapReduce, Pig, Hive, Impala, Hbase, Oozie, Sqoop, Flume, Linux.
Confidential
Java Developer
Responsibilities:
- Worked on both WebLogic Portal 9.2 for Portal development and WebLogic 8.1 for Data Services Programming
- Worked on creating EJBs that implemented business logic
- Developed the presentation layer using JSP, HTML, CSS and client validations using JavaScript
- Involved in designing and development of the ecommerce site using JSP, Servlet, EJBs, JavaScript and JDBC
- Used Eclipse 6.0 as IDE for application development
- Validated all forms using Struts validation framework and implemented Tiles framework in the presentation layer
- Configured Struts framework to implement MVC design patterns
- Designed and developed GUI using JSP, HTML, DHTML and CSS
- Worked with JMS for messaging interface
Environment: Java, J2EE, HTML, DHTML, JSP, Servlets, XML, EJB, Sturts, GIT, Weblogic 8.1, SQL Server 2008R2, CentOS, UNIX, Linux, Windows 7/Vista/XP
