Sr. Big Data Architect Resume
Charlotte, NC
SUMMARY
- 9+ years of professional experience this includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE Technologies andHadooptechnologies.
- Experienced in installing, configuring, testingHadoopecosystem components on Linux /UNIX includingHadoopAdministration (like Hive, pig, Sqoop etc.)
- Expertise in Java,HadoopMap Reduce, Pig, Hive, Oozie, Sqoop, Flume, Nifi, Zookeeper, Impala and NoSQL Database.
- Excellent experienced on Hadoopecosystem, In - depth understanding of Map Reduce and theHadoopInfrastructure.
- Excellent experience in Amazon, Cloudera and Hortonworks Hadoop distribution and maintaining and optimized AWS infrastructure (EMR EC2, S3, EBS)
- Expertise in developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Excellent knowledge onHadoopArchitecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming paradigm.
- Experienced working with Hadoop Big Data technologies (hdfs and Mapreduce programs), Hadoop echo systems (Hbase, Hive, pig) and NoSQL database MongoDB
- Expertise in Data Development in Hortonworks HDP platform & Hadoop ecosystem tools like Hadoop, HDFS, Spark, Zeppelin, Hive, HBase, SQOOP, flume, Atlas, SOLR, Pig, Falcon, Oozie, Hue, Tez, Apache NiFi, Kafka
- Experienced on usage of NoSQL database column-oriented HBase.
- Extensive experienced in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
- Experienced on major components inHadoopEcosystem including Hive, Sqoop, Flume &knowledge of MapReduce/HDFS Framework.
- Experienced in working with MapReduce Design patterns to solve complex MapReduce programs.
- Excellent Knowledge in Talend Big data integration for business demands to work towards Hadoop and NoSQL
- Hands-on programming experience in various technologies like JAVA, J2EE, HTML, XMLand excellent Working Knowledge on Sqoop and Flume for Data Processing
- Expertise in loading the data from the different Data sources like (Teradata and DB2) into HDFS using sqoop and load into partitioned Hive tables
- Working experience in developing applications involving Big Data technologies likeMap Reduce, HDFS, Hive, Sqoop, Pig, Oozie, HBase, NiFi, Spark, Scala, KafkaandZoo Keeper.
- Experienced onHadoopcluster maintenance including data and metadata backups, file system checks, commissioning and decommissioning nodes and upgrades.
- Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java.
- Strong experience in analyzing large amounts of data sets writing Pigscripts and Hive queries.
- Extensive experienced in working with structured data using Hive QL, join operations, writing custom UDF's and experienced in optimizing Hive Queries.
- Experienced in importing and exporting data using Sqoop from HDFS to Relational Database and expertise in job workflow scheduling and monitoring tools like Oozie.
- Experienced in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sources etc.
- Extensively designed and executed SQL queries in order to ensure data integrity and consistency at the backend.
- Strong experience in architecting batch style large scale distributed computing applications using tools like Flume, Map reduce, Hive etc.
- Experience using variousHadoopDistributions (Cloudera, Hortonworks, MapRetc) to fully implement and leverage newHadoopfeatures
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.,
- Experienced in working with different scripting technologies like Python, UNIX shell scripts.
- Strong experienced in working with UNIX/LINUX environments, writing shell scripts.
- Excellent knowledge onHadoopArchitecture and ecosystems such as HDFS, configuration of nodes, YARN, MapReduce, Sentry, Spark, Falcon, Hbase, Hive, Nifi, Pig, Sentry, Ranger.
- Excellent knowledge and working experience in Agile&Waterfallmethodologies.
- Expertise in Web pages development using JSP, HTML, Java Script, JQuery and Ajax.
- Experienced in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, and MySQL & Sybase databases.
- Great team player and quick learner with effective communication, motivation, and organizational skills combined with attention to details and business improvements.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Zookeeper and Cloudera Manager, MongoDB, NO SQL Database HBase, Kafka, Impala, Nifi, and Cassandra.
Monitoring and Reporting: Tableau, Custom shell scripts,HadoopDistribution Horton Works, Cloudera, MapR
AWS: AWS EMR, AWS EC2 and AWS Redshift.
Build Tools: Maven, SQLDeveloper
Programming & Scripting: JAVA, HTML, Javascript, JQuery, PL/SQL, SQL, Scala, Shell Scripting and Python.
Databases: Oracle, MY SQL, MS SQL server, Teradata and DB2.
Web Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript, angular JS
Version Control: SVN, CVS, GIT
Operating Systems: Linux, UNIX, Mac OS-X, Windows 8 and Windows 7
PROFESSIONAL EXPERIENCE
Sr. Big Data Architect
Confidential, Charlotte NC
Responsibilities:
- Designing the Hadoop infrastructure from scratch on Apache Hadoop and the other platforms.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase and Hive.
- Scripts were written for distribution of query for performance test jobs in Amazon Datalake.
- Created Hive Tables, loaded transactional data from Teradata using Sqoop and Worked with highly unstructured and semi structured data of 2 Petabytes in size
- Performed data analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries in Python.
- Developed MapReduce (YARN) jobs for cleaning, accessing and validating the data and created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Developed optimal strategies for distributing the web log data over the cluster importing and exporting the stored web log data into HDFS and Hive using Sqoop.
- ApacheHadoopinstallation & configuration of multiple nodes on AWS EC2 system and developed Pig Latin scripts for replacing the existing legacy process to the Hadoop and the data is fed to AWS S3.
- Responsible for building scalable distributed data solutions using Hadoop Cloudera and writing Pigscripts to transform raw data from several data sources into forming baseline data.
- Responsible for daily ingestion of data from DATALAKE to CDB Hadoop tenant system and Automation of Business reports using Bash scripts in Unix on Datalake by sending them to business owners.
- Installed and configured Hadoop, Hive, Sqoop, HDFS, Developed multiple Python /Hive/Sqoopscripts for data cleaning and processing and Designed and developed automation test scripts using Python
- Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to HDFS, Hbase and Hive by integrating with Storm.
- Implementing and designing the Real Time Analytics model through Kafka, Spark Streaming andNifi, dumping the data to MongoDb.
- Analyzed the SQL scripts and designed the solution to implement usingPyspark and implemented HiveGenericUDF's to in corporate business logic into HiveQueries.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS and uploaded streaming data from Kafka to HDFS, HBase and Hive by integrating with storm.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Designed and implemented customNiFiprocessors that reacted, processed for the data pipeline and worked with developers teams to move data in to HDFS through HDF Nifi.
- Supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud performed Export and import of data into S3.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Worked on MongoDB by using CRUD (Create, Read, Update and Delete), Indexing, Replication and Sharding features.
- Involved in designing the row key in Hbase to store Text and JSON as key values in Hbase table and designed row key in such a way to get/scan it in a sorted order.
- Integrated Oozie with the rest of theHadoopstack supporting several types ofHadoopjobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Worked on customtalendjobs to ingest entich and distribute data in ClouderaHadoop ecosystem.
- Creating Hive tables and working on them using Hive QL and designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
- Developed multiple POCs usingPySparkand deployed on the YARN cluster, compared the performance of Spark, with Hive and SQL and Involved in End-to-End implementation of ETL logic.
- Develop NiFiworkflow to pick up the multiple retail files from FTP location and move those to HDFS on daily basis.
- Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables and worked on Cluster co-ordination services through Zookeeper.
- Designing & Creating ETL Jobs through Talend to load huge volumes of data into Cassandra, Hadoop Ecosystem and relational databases.
- Monitored workload, job performance and capacity planning using Cloudera Manager and involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
- Creating the cube inTalendto create different types of aggregation in the data and also to visualize them and involved in agile methodologies, daily scrum meetings, spring planning.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, Oozie, Maven, Nifi, Python, Shell Scripting, CDH, MongoDB, HBase, Cloudera, AWS (S3, EMR), SQL, Python, Scala, Spark, RDBMS, Java, HTML, Pyspark, JavaScript, WebServices, Kafka, Strom, Talend.
Sr. Big Data/Hadoop Developer Aug ’15 - Mar ’16
IRI, Chicago IL
Responsibilities:
- Responsible for installation and configuration of Hive, Pig, Hbase and Sqoop on the Hadoop cluster and created hive tables to store the processed results in a tabular format.
- Configured Spark Streaming to receive real timedatafrom the Apache Kafka and store the streamdatato HDFS using Scala.
- Processed data into HDFS by developing solutions and analyzed the data using Map Reduce, PIG, and Hive to produce summary results from Hadoop to downstream systems.
- Build servers using AWS: Importing volumes, launching EC2, creating security groups, auto-scaling, load balancers, Route 53, SES and SNS in the defined virtual private connection and Streamed AWS log group into Lambda function to create service now incident.
- Hands on Datalake cluster with Hortonworks Ambari on AWS using EC2 and S3.
- Written Map Reduce code to process and parsing thedatafrom various sources and storing parseddatainto HBase and Hive using HBase-Hive Integration.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala and Created Managed tables and External tables in Hive and loaded data from HDFS.
- Developed Spark code by using Scala and Spark-SQL for faster processing and testing and performed complex HiveQL queries on Hive tables.
- Scheduled several time based Oozie workflow by developing Python scripts.
- Developed Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extractdatafromdatafiles to load into HDFS.
- Exporting thedatausing Sqoop to RDBMS servers and processed thatdatafor ETL operations.
- Worked on S3 buckets on AWS to store Cloud Formation Templates and worked on AWS to create EC2 instances.
- Designing ETLDataPipeline flow to ingest thedatafrom RDBMS source toHadoopusing shell script, sqoop, package and MySQL.
- End-to-end architecture and implementation of client-server systems using Scala, Akka, Java, JavaScript and related, Linux
- Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as Java map-reduce Hive, Pig, and Sqoop.
- ImplementingHadoopwith the AWS EC2 system using a few instances in gathering and analyzingdatalog files.
- Involved in Spark and Spark Streaming creating RDD's, applying operations -Transformation and Actions.
- Worked on installing cluster, commissioning & decommissioning ofDataNodes, NameNoderecovery, capacity planning, and slots configuration.
- Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Developed custom Apache Spark programs in Scala to analyze and transform unstructured data.
- Handled importing ofdatafrom variousdatasources, performed transformations using Hive, MapReduce, loadeddatainto HDFS and Extracted thedatafrom Oracle into HDFS using Sqoop
- Using Kafka on publish-subscribe messaging as a distributed commit log, have experienced in its fast, scalable and durability.
- Test Driven Development (TDD) process and extensive experience with Agile and SCRUM programming methodology.
- Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALAand scheduled map reduce jobs in production environment using Oozie scheduler.
- Upgrading the Hadoop Cluster from CDH3 to CDH4, setting up High availability Cluster and integratingHIVEwith existing applications
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
- Designed and implemented map reduce jobs to support distributed processing using java, Hive and Apache Pig
- AnalyzingHadoopcluster and differentBigDataanalytic tools including Pig, Hive, HBase and Sqoop.
- Improved the Performance by tuning of HIVE and map reduce and Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better.
- Research, evaluate and utilize new technologies/tools/frameworks aroundHadoopecosystem
Environment: HDFS, MapReduce, Hive, Sqoop, Pig, Flume, Vertica, Oozie Scheduler, Java, Shell Scripts, Teradata, Oracle, HBase, MongoDB, Cassandra, Cloudera, AWS, Javascript, JSP, Kafka, Spark, Scala and ETL, Python.
Hadoop/Big Data Developer
Confidential, Harrisburg, PA
Responsibilities:
- Responsible for building scalable distributed data solutions usingHadoop
- Designed the projects using MVC architecture providing multiple views using the same model and thereby providing efficient modularity and scalability
- Custom talend jobs to ingest, entich and distribute data in ClouderaHadoop ecosystem.
- Downloads the data that was generated by sensors from the Confidential ts body activities, the data will be collected in to the HDFS system online aggregators by Kafka.
- Exploring with Spark improving the performance and optimization of the existing algorithms inHadoopusing Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Improving the performance and optimization of existing algorithms inHadoopusing Spark context, Spark-SQL and Spark YARN using Scala.
- Implemented Spark Core in Scala to process data in memory.
- Performed job functions using Spark API's in Scala for real time analysis and for fast querying purposes.
- Spark Streaming collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQL store (Hbase).
- Have done OOP's and functional programming on SCALA.
- Used Hadoop's Pig, Hive and Map Reduce for analyzing the Health insurance data to help by extracting data sets for meaningful information such as medicines, diseases, symptoms, opinions, geographic region detail etc.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework.
- Handled importing ofdatafrom variousdatasources, performed transformations using MapReduce, Spark and loadeddatainto HDFS.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing personal information or merging many small files into a handful of very large, compressed files using pig pipelines in the data preparation stage.
- UsedPig in three distinct workloads like pipelines, iterative processing and research.
- UsedPig UDF's in Python, Java code and uses sampling of large data sets.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume and process the files by using some piggybank.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Created PIG Latin scripting and Sqoop Scripting.
- Involved in transforming data from legacy tables to HDFS, and HBASE tables using Sqoop
- Implemented exception tracking logic using Pig scripts
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
Environment: Hadoop, Map Reduce, Spark, shark, Kafka, HDFS, Hive, Pig, Oozie, Core Java, Eclipse, Hbase, Flume, Cloudera, Oracle 10g, UNIX Shell Scripting, Scala, MongoDB, HBase, Cassandra, Python.
Sr. Java/J2EE Developer
Confidential, Columbus, OH
Responsibilities:
- Developed and coordinated complex high quality solutions to clients using J2SE, J2EE, Servlets, JSP, HTML, Struts, Spring MVC, SOAP, JavaScript, JQuery, JSON and XML and used JSF framework to implement MVC design pattern.
- Wrote JSF managed beans, converters, and validator’s following framework standards and used explicit and implicit navigations for page navigations.
- Designed and developed Persistence layer components using HibernateORM tool and UI designed using JSF tags, Apache Tomahawk & Rich faces.
- Developed web services using XML, REST and SOAPAPIs for providing business services to other applications.
- Oracle 10g used as backend to store and fetch data and used IDE's like Eclipse and Net Beans, integration with Maven
- Creating Real-time Reporting systems and dashboards using xml, MySQL, and Perl
- Working on Restful web services which enforced a stateless client server and support JSON (few changes from SOAP to RESTFUL Technology) Involved in detailed analysis based on the requirement documents.
- Involved in Design, development and testing of web application and integration projects using Object Oriented technologies such as Core Java, J2EE, Struts, JSP, JDBC, Spring Framework, Hibernate, Java Beans, Web Services (REST/SOAP), XML, XSLT, XSL, and Ant.
- Designing and implementing SOA compliant management and metrics infrastructure for Mule ESB infrastructure utilizing the SOA management components.
- Used Node JS for server side rendering. Implemented modules into Node JS to integrate with designs and requirements.
- JAX-WS used to interact in front-end module with backend module as they are running in two different servers.
- Responsible for Offshore deliverables and provide design/technical help to the team and review to meet the quality and time lines.
- Migrated existing Struts application to Spring MVC framework and provided and implemented numerous solution ideas to improve the performance and stabilize the application.
- Developed Graphical User Interfaces using HTML, XML/XSLT and JSP's for user interaction and CSS for styling.
- Extensively used LDAP Microsoft Active Directory for user authentication while login and developed unit test cases using JUnit.
- Created the project from scratch using Angular JS as frontend, Node Express JS as backend.
- Involved in developing perl script and some other scripts like java script and designed and documented REST/HTTP APIs, including JSON data formats.
- Tomcat is the webserver used to deploy OMS web application and used SOAPLite module to communicate with different web-services based on given WSDL.
- Prepared technical reports & documentation manuals during the program development.
Environment: JDK 1.5, JSF, Hibernate 3.0, Struts, JIRA, NodeJS, HTML, CSS, JSON, JSP, Cruise control, Log4j, Tomcat, LDAP, JUNIT, NetBeans, Windows/UNIX.
Java Developer
Confidential
Responsibilities:
- Performed analysis for the client requirements based on the developed detailed design documents.
- Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models.
- Developed STRUTS forms and actions for validation of user request data and application functionality and eveloped JSP's with STRUTS custom tags and implemented JavaScript validation of data.
- Developed programs for accessing the database using JDBC thin driver to execute queries, Prepared statements, Stored Procedures and to manipulate the data in the database
- Wrote Hibernate configuration file, Hibernate mapping files and defined persistence classes to persist the data into Oracle Database.
- Used JavaScript for the web page validation and Struts Validator for server side validation
- Designing the database and coding of SQL, PL/SQL, Triggers and Views using IBM DB2.
- Implemented the Struts framework based on MVC design pattern and Session Façade Pattern using Session and Entity Beans.
- Developed clickable prototypes in HTML, DHTML, Photoshop, CSS, JSP, and JavaScript.
- Developed Message Driven Beans for asynchronous processing of alerts.
- Used Clearcase for source code control and JUNIT for unit testing.
- Involved in peer code reviews and performed integration testing of the modules. Followed coding and documentation standards.
Environment: Java, Struts, JSP, JDBC, XML, Junit, Rational Rose, CVS, DB2, Windows.