Big Data Consultant Resume
MinnesotA
PROFESSIONAL SUMMARY:
- Over 8 years of experience with emphasis on Big Data /Hadoop technologies, development and design of Java based enterprise applications.
- Experience in complete software development life cycle (SDLC) like User Interaction, Design, Development, Implementation, Integration, Documentation, All types of testing, Deployment, Builds, Configuration and Code Management.
- Expertise in the creation of On - prem and Cloud Data Lake.
- Experience working with Cloudera Distribution of Hadoop.
- Expertise in HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Hbase, Oozie, Flume, and various other ecosystem components.
- Having knowledge in Hadoop Cluster Setup, Integrations, and Installations.
- Expertise in Spark framework for batch and real time data processing.
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper.
- Extensive experience with big data query tools like Pig Latin and HiveQL.
- Experience with Sequence files, AVRO and HAR file formats and compression.
- Experience in tuning and troubleshooting performance issues in Hadoop cluster.
- Experience on monitoring, performance tuning, SLA, scaling and security in Big Data systems.
- Experience in working with MapReduce programs using Apache Hadoop for working with Big Data
- Hands on NoSQL database experience with HBase, MongoD B and Cassandra.
- Extensive experience in Data Ingestion, In-Stream data processing, Batch Analytics and Data Persistence strategy.
- Experience in designing and architecting large scale distributed applications.
- Knowledge on converting MapReduce applications to Spark.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Experience in Data migration from existing data stores and mainframe NDM (Network Data mover) to Hadoop
- Experience in handling multiple relational databases: MySQL, SQL Server, and Oracle.
- Experience in supporting data analysis projects using Elastic MapReduce on the Amazon Web Services(AWS) cloud. Exporting and importing data into S3.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Experience in supporting analysts by administering and configuring HIVE.
- Experience in running Pig and Hive scripts.
- Experience in fine-tuning Map R educe jobs for better scalability and performance.
- Developed various MapReduce applications to perform ETL workloads on terabytes of data.
- Performed Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in writing LINUX/UNIX shell scripts to dump the Sharded data from Landing Zones to HDFS.
- Worked on predictive modeling techniques like Neural Networks, Decision Trees and Regression Analysis.
- Experience in Data mining and Business Intelligence tools such as Teradata, Informatica and MSBI.
- Strong experience as a senior Java Developer in Web/intranet, Client/Server technologies using Java, J2EE, Servlets, JSP, EJB, JDBC.
- In depth knowledge of Object Oriented programming methodologies (OOPS) and object oriented features like Inheritance, Polymorphism, Exception handling and Templates and development experience with Java technologies.
- I have well experienced in training, mentoring and motivating other members of the team and other teams to achieve the goals of the company.
- Involved in in-room/telephonic Scrum meetings to gather requirements and analyzing the requirements and developments.
- Strong experience in client interaction and understanding business application, business data flow and data relations from them.
- Experience in different operating Systems UNIX, LINUX, and WINDOWS.
- Strong troubleshooting and production support skills and interaction abilities with end users
- Good working knowledge of industry best practices for Enterprise development including implementing and reassuring to design patterns.
- Experience in problem solving, analysis, implementation, installation, and configuration skills.
- Good interpersonal skills, commitment, result oriented, hard working with a quest and zeal to learn new technologies and undertake challenging tasks. Excellent team member with strong communication skills and capable of meeting set Deadlines.
TECHNICAL SKILLS:
Programming Languages: C, Java, C#, LINUX/UNIX Shell Scripting.
Big Data Technologies: Apache Hadoop, Cloudera Distribution (HDFS & Map Reduce)
Hadoop Ecosystem: Yarn, Spark, PIG, Hive, SQOOP, Flume, Zookeeper, OOZIE, Hue.
NoSQL: HBase, Cassandra, MongoDB
Database Tools: SQL Server 2008, MYSQL, Oracle 10G.
Operating Systems: Windows XP/7/8, LINUX, UNIX.
Version Control: CVS, SVN, GIT, MantisBT and JIRA.
Networking: Putty, FileZilla and WinSCP.
IDE s and Utilities: Eclipse, NetBeans.
Data Integration Tools: Talend, Datameer
Others: Spring MVC, XML, SOAP,AWS
PROFESSIONAL EXPERIENCE:
Confidential, Minnesota
Big Data Consultant
Environment: Hadoop, CDH 4.X, Hue, MapReduce, Hive, Pig, Sqoop, Oozie, NOSQL, core java, JDBC, J2EE, Teradata, SVN, Eclipse, Putty, WinSCP, Shell Scripting and Ubuntu 10.
Responsibilities:
- Performed Sqoop imports of data from Data warehouse platform to HDFS and built hive tables on top of the datasets.
- Built ETL workflow to process data on hive tables.
- Used HUE to create Oozie workflows to perform different kinds of actions such as hive, java &MapReduce.
- Worked extensively in Hive used features like UDF and UDAFs.
- Supported MapReduce Programs those are running on the cluster.
- Responsible to manage data coming from different sources.
- Used sequence and avro file formats and snappy compressions while storing data in HDFS. Used Efficient Columnar Storage like parquet for data used by business.
- Worked extensively in MapReduce using Java Well versed with features like multiple output in MapReduce.
- Worked on various types of SERDE.
- Mentored analyst and test team for writing Hive Queries.
- Worked on features like reading a hive table from MapReduce and making it available for all data nodes by keeping in distributed cache. Used both Hue and xml for Oozie.
- Got good experience with NOSQL databases.
- Extracted the data from Teradata into HDFS using the Sqoop
- Participated in building test cluster for implementing Kerberos authentication. Installing Cloudera manager and Hue.
- Worked on different file formats (ORCFILE, RCFILE, SEQUENCEFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO)
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from LINUX, NoSQL and a variety of portfolios.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Written the Apache PIG scripts to process the HDFS data.
- Created Hive External tables to store the processed results in a tabular format, Ad-hoc reporting.
- Writing CLI commands using HDFS.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Developed shell scripts for creating the reports from Hive data.
- Writing the PIG UDF’s for achieving the desired functionality.
- Involved in understanding the requirements and KT sessions.
Confidential, Utah
Hadoop Developer
Environment: Hadoop, CDH 3.X, Hue, MapReduce, Hive, Pig, Sqoop, Oozie, NOSQL, core java, JDBC, J2EE, Oracle, MySQL, SVN, Eclipse, Putty, WinSCP, Shell Scripting and LINUX.
Responsibilities:
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbase, Cassandra database s and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from LINUX file system to HDFS.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Implemented a script to transmit information from Oracle, MYSQL to Hbase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance Pig queries.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Responsible to manage data coming from different sources..
- Load and transform large sets of structured, semi structured and unstructured data
- Cluster coordination services through Zookeeper.
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop. Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Administrator for Pig, Hive and HBase installing updates, patches and upgrades.
- Writing the script files for processing data and loading to HDFS.
- Writing CLI commands using HDFS.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop, Hive and Pig.
- Involved in understanding the requirements and KT sessions.
Confidential
Java Developer
Environment: Java, JSP,HTML, CSS, MySQL, JDBC, Eclipse IDE, SPRING MVC, GIT.
Responsibilities:
- Interacted with the clients to understand business requirements.
- Analyzed and developed Use Case diagrams, Sequence diagrams and Activity diagrams using UML.
- Involved in the development of application using Core Java, and JDBC.
- Worked with various IDE like Eclipse, Net Beans IDE.
- Used Core Java concepts in application such as multithreaded programming, synchronization of threads used thread wait, notify, join methods etc.
- Creating cross-browser compatible and standards-compliant CSS-based page layouts.
- Performed Unit Testing on the applications that are developed.
- Involved in SCRUM meetings and developed and fixed the issues
- Worked with SVN Commands.
- Configuring JDBC connection pooling to access the MySQL database..
- Written many SQL procedures and SQL queries.
- Building and deployment of EAR and JAR files on test, stage and production systems.
- Designing the UML diagrams (Sequence diagrams and class diagrams).
- Creation of database objects like tables, views etc…
- Regression testing, evaluating the response times, and resolving the connection pooling issues.
- Involved in deployment activities.
- Performed Performance and Unit testing.
- Involved in the KT session to the new resources about functionality and high level architecture.
Confidential
Java Developer
Environment: Java, MySQL, JDBC, Eclipse IDE, SOAP, SVN.
Responsibilities:
- Interacted with the clients to understand business requirements.
- Analyzed and developed Use Case diagrams, Sequence diagrams and Activity diagrams using UML.
- Involved in the development of application using Core Java, and JDBC.
- Worked with various IDE like Eclipse, Net Beans IDE.
- Worked with SVN Commands
- Using stateless session beans.
- Configuring JDBC connection pooling to access the MySQL database.
- Written many SQL procedures and SQL queries.
- Building and deployment of EAR and JAR files on test, stage and production systems.
- Designing the UML diagrams (Sequence diagrams and class diagrams).
- Creation of database objects like tables, views etc…
- Regression testing, evaluating the response times, and resolving the connection pooling issues.
- Performed Performance and Unit testing.
