Sr. Bigdata/hadoop Engineer Resume
Atlanta, GA
SUMMARY:
- Over 8+ years of professional experience in IT, which includes Analysis, Design, Coding, Testing, Implementation and support in Java and Big Data Technologies working with Apache Hadoop Eco - components.
- 4+ years of exclusive experience in Hadoop and its components like HDFS, Map Reduce, Apache Pig, Hive, Sqoop, HBase and Oozie
- Involved in writing the Pig scripts and Pig UDFs to pre-process the data for analysis
- Experience in creating Hive External and Managed tables and writing queries on them
- Hands on Experience in troubleshooting operational issues and identifying root causes of Hadoop Cluster
- Expertise in managing data from multiple sources and transform large sets of data
- Extensively used Sqoop to import data into HDFS from RDBMS and vice-versa.
- Designing and creating HIVE external tables using shared meta-store instead of the derby with partitioning, dynamic partitioning and buckets.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Experience in integrating Hive and HBase for effective operations.
- Good understanding in working with various compression techniques like Avro, Snappy, LZO
- Experienced in working with Spark ecosystem using Spark-SQL and Scala queries on different data file formats like .txt, .csv etc.
- Hands on experience in migrating Map Reduce jobs into Spark RDD transformations using SCALA.
- Good experience in Cloudera, Hortonworks & Apache Hadoop distributions.
- Strong understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB, Redis, Neo4j.
- Working knowledge on major Hadoop ecosystems PIG , HIVE , Sqoop, and Flume.
- Experience in implementing Custom Partitions and Combiners for effective data distributions.
- Experience in Writing Map Reduce jobs for text mining for predictive analysis
- Experience in analyzing data using Cassandra QL, Hive QL and Pig Latin programs.
- Experience in implementing Custom Partitions and Combiners for effective data distributions.
- Good working knowledge with Map Reduce and Apache Pig
- Experience in application development using Java, J2EE, EJB, Hibernate, JDBC, Jakarta Struts, JSP and Servlets.
- Experience in using various IDEs Eclipse, My Eclipse and repositories SVN and CVS.
- Experience of using build tools Ant and Maven.
- Working with relative ease with different working strategies like Agile, Waterfall and Scrum methodologies.
- Excellent communication and analytical skills and flexible to adapt to evolving technology.
TECHNICAL SKILLS:
Languages: C, C++, Python, Java, J2EE, SQL, PL/SQL, Scala, UML, XML
Hadoop Ecosystem: HDFS, MapReduce, Spark Core, Spark Streaming, Spark SQL, Hive, Pig, Sqoop, Flume, Kafka, Oozie, Zookeeper.
Databases: Oracle 10g/11g, SQL Server, MYSQL, DB2
No SQL: HBase, Cassandra, MongoDB
Application / Web Servers: Apache Tomcat, JBoss, Mongrel, Web Logic, Web Sphere
Web Services: SOAP, REST
Operating systems: Windows, Unix, Linux
Microsoft Products: MS office, MS Visio, MS Project
Frameworks: Spring, Hibernate, Struts
PROFESSIONAL EXPERIENCE:
Confidential, Atlanta, GA
Sr. BigData/Hadoop Engineer
Roles &Responsibilities:
- Moved all crawl data flat files generated from various retailers to HDFS for further processing.
- Written the Apache PIG scripts to process the HDFS data.
- Created Hive tables to store the processed results in a tabular format.
- Developed the sqoop scripts in order to make the interaction between Pig and MySQL Database.
- Writing the script files for processing data and loading to HDFS
- Writing CLI commands using HDFS.
- Developed the UNIX shell scripts for creating the reports from Hive data.
- Completely involved in the requirement analysis phase.
- Created two different users (hduser for performing hdfs operations and mapred user for performing map reduce operations only)
- Ensured NFS is configured for Name Node
- Setting Password less Hadoop
- Written PIG scripts to process the Credit Card and Debit Card Transactions for Active customers by joining the data from HDFS and Hive using HCatalog for various merchants
- Responsible for writing Lucene search program for high-performance, full-featured text search of Merchants
- Written Python UDFs to process the RegEx and return the valid Merchant codes and names using streaming
- Created Hive scripts for joining the raw data with the lookup data and for some aggregative operations as per the business requirement.
- Setting up cron job to delete Hadoop logs/local old job files/cluster tempfiles
- Setup Hive with MySQL as a Remote Metastore
- Moved all log/text files generated by various products into HDFS location
- Written Map Reduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on thelog data
- Loading data from UNIX file system to HDFS and vice versa.
- Implemented Data Ingestion in real time processing using Kafka.
- Created External Hive Tables on top of parsed data.
Environment: Hadoop, HDFS, Map Reduce, Apache Pig, Hive, SQOOP, Linux, MySQL, Spark, Hbase, Hortonworks, HDP 2.6.5
Confidential, Chicago, IL
Hadoop Developer/Spark Developer
Roles &Responsibilities:
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Wrote Hive queries for data analysis to meet the business requirements.
- Created Hive tables and working on them using Hive QL.
- End-to-end performance tuning of Hadoop clusters and Hadoop MapReduce routines against very large data sets.
- Exported analyzed data to HDFS using Sqoop for generating reports.
- Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
- Written Java UDFs to convert to upper case of card names & process dates to suitable format in PIG & Hive
- Responsible for build the Docker containers and scheduled the Oozie workflows to run the sprints
- Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the warnings.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- End-to-end involvement in data ingestion, cleansing, and transformation in Hadoop.
- Developed Hive queries for the analysts.
- Used Impala to read, write and query the Hadoop data in HDFS from Cassandra and configured Kafka to read and write messages from external programs.
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Cluster co-ordination services through Zookeeper.
- Written the Spouts and Bolts after collecting the real stream customer data from Kafka broker to process and store into HBASE.
- Analyze the log files and process through Flume
- Experience in optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization.
Environment: Hadoop, HDFS, Map Reduce, Apache Pig, Hive, SQOOP, Linux, MySQL, Spark
Confidential, Columbus, Ohio
Hadoop Developer
Roles &Responsibilities:
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Wrote Flume configuration files for importing streaming log data into MongoDB with Flume
- Performed masking on customer sensitive data using Flume interceptors.
- Used IMPALA to analyze data ingested into Hive tables and compute various metrics for reporting on the dashboard .
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Wrote Hive queries for data analysis to meet the business requirements.
- POC developed to pull Realtime twitter streaming using kafka, flume and spark
- Used parquet compression format for hive table creation
- Implemented Kerberos security Implementation
- Created users in Active Directory and map the roles in each group for the users in Apache Sentry
- Involved in LDAP implementation for different types of accesses in AD for Hue, Hive, Pig
- Complete caring of Hive and Spark tuning with partitioning/bucketing of ORC and executors/driver’s memory
- Involved in Extracting, transforming, loading Data from Hive to Load an RDBMS.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Involved in extracting, transform and load the data into HBase Database
- Implemented Partitioning and bucketing in Hive based on the requirement.
- Involved in transforming Data within a Hadoop Cluster
- Involved in Using MapReduce to Parse Weblog Data for MapReduce to convert raw weblog data into parsed, delimited records.
- Written Hive UDFs to extract data from staging tables
- Developed hive external tables
- Created Hbase tables to store Json data
Environment: Eclipse, jdk1.8.0, Hadoop2.8, HDFS, MapReduce, Pig0.15.0, Hive2.0, HBase, Kerberos, Apache-Maven3.0.3
Confidential
Java Developer
Roles &Responsibilities:
- Analyzing the feasibility Documents
- Coding the business logic methods in core java.
- Involved in development of the Action classes and Action Forms based on the Struts framework.
- Participated in client-side validation and server-side validation.
- Involved in creation of struts configuration file and validation file for skip module using struts framework.
- Developed java programs, JSP pages and servlets using Spring framework.
- Involved in creating database tables, writing complex TSQL queries and stored procedures in the SQL server.
- Worked with AJAX framework to get the asynchronous response for the user request and used JavaScript for the validation.
- Used EJBs in the application and developed Session beans to implement business logic at the middle tier level.
- Actively involved in writing SQL using SQL Query Builder.
- Used JAXB to read and manipulate the xml properties.
- Used JNI for calling the libraries and other implemented functions in C language.
- Handling Server Related issues, new requirement handling, changes and patch movements.
- Developed the Restful Web Services for various XSD schemas.
- Used Servlets to implement Business components.
- Designed and Developed required Manager Classes for database operations.
- Developed various Servlets for monitoring the application.
- Designed the UML class diagram, Sequence diagrams for Trade Services.
- Designed the complete Hibernate mapping for SQL Server for PDM.
- Designed the complete JAXB classes mapping for various XSD schemas.
- Developed the Restful Web Services for various XSD schemas.
- Involved in writing JUnit test Classes for performing Unit testing.
Environment: Eclipse neon, jdk1.8.0, Java, Servlets, JSP, EJB, xml, SQL server, Struts, JUnit and Eclipse, SQL, UNIX, UML, Apache-Maven3.0.3
Confidential
Java Developer
Roles &Responsibilities:
- Identifying reviewing, assessing and resolving production issues
- Configure and maintain the associated application components and environments (as required).
- Provide application support to management, team members and end users
- Having experience on the sales functionality and order management
- Worked on the Email template creation with HTML code as per the requirements
- Email notification configured based on the requirement
- Involved in writing programs for XA transaction management on multiple databases of the application.
- Developed java programs, JSP pages and servlets using Cantata Struts framework.
- Involved in creating database tables, writing complex TSQL queries and stored procedures in the SQL server.
- Worked with AJAX framework to get the asynchronous response for the user request and used JavaScript for the validation.
- Used EJBs in the application and developed Session beans to implement business logic at the middle tier level.
- Actively involved in writing SQL using SQL Query Builder.
- Involved in coordinating the on-shore/Off-shore development and mentoring the new team members.
- Extensively Used Ant tool to build and configure J2EE applications and used Log4J for logging in the application
- Used JAXB to read and manipulate the xml properties.
- Used JNI for calling the libraries and other implemented functions in C language.
- Used prototype MooTools and script.aculo.us for fluid User Interface.
- Involved in fixing defects and unit testing with test cases using JUnit.
- Involved in Configured business components, Views, applets, Controls, Menus and other objects to meet the business Requirements
Environment: jdk1.8.0, Java, Servlets, JSP, xml, SQL server, JUnit and Eclipse, Unix, UML, Apache-Maven3.0.3, EJB, Servlets, XSLT, CVS, J2EE, AJAX, Struts, Hibernate, ANT, Tomcat, JMS, Log4J, Oracle 10g, Eclipse, Solaris, JUnit and Windows 7/XP