Solutions Engineer Resume
Bellevue, WA
PROFESSIONAL SUMMARY:
- 8+ years of extensive Professional IT experience, including 4+ years ofHadoop/Bigdata experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- 4+ years of experience in Big dataHadoop,HadoopEcosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Oozie, and Zookeeper.
- Developed SPARK applications using Scala for easyHadooptransitions.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Implemented Oozie work-flow for ETL Process.
- Good Knowledge in Spark and Scala.
- Hands on experience on build tools like Maven, Log4j, Junit and Ant.
- Working with the data extraction, transformation and load in Hive, Pig and HBase.
- Hands on Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Good understanding in Cassandra & MongoDB implementation.
- Good knowledge in working with NoSQL databases including Cassandra and MongoDB.
- Hands on experience on Streaming data ingestion and Processing
- Experience in designing different time driven and data driven automated workflows using Oozie.
- Conviva and MLlib from Spark are used for predictive intelligence, customer segmentation and for smooth maintenance in spark streaming.
- Expertise in writing the Real-time processing application Using spout and bolt in Storm.
- Experience in configuring various topologies in Storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop.
- Highly Acumen in choosing an efficient ecosystem inHadoopand providing the best solutions to Big Data problems.
- Well versed with Design and Architecture principles to implement Big Data Systems.
- Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Acumen on Data Migration from Relational Database to Hadoop Platform using SQOOP.
- Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
- Good understanding of MPP databases such as HP Vertica and Impala.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Experience in using design pattern, Java, JSP, Servlets, JavaScript, HTML, JQuery, Angular JS, Mobile JQuery, JBOSS 4.2.3, XML, Web Logic, SQL, PL/SQL, JUnit, and Apache-Tomcat, Linux.
- Expertise in relational databases like Oracle, My SQL and SQL Server.
- Experience in implementing projects both in Agile and Waterfall methodologies.
- Well versed with Sprint ceremonies that are practiced in Agile methodology.
- Strong Experience on Data Warehousing ETL concepts using Informatica Power Center, OLAP, OLTP and AutoSys.
- Highly involved in all phases of SLDC with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in a client server environment, Object Oriented Technology and Web based applications.
- Strong analytical and problem solving skills, highly motivated, good team player with very Good communication & interpersonal skill.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, Zookeeper, Spark, Ambari, Solr, Mahout, MongoDB, Cassandra, Avro, Storm, Parquet and Snappy
Hadoop Distributions: Cloudera (CDH3, CDH4, and CDH5), Hortonworks, MapR and Apache,EMR
Languages: Java, Python, Ruby, SQL, HTML, DHTML, Scala, JavaScript, XML and C/C++.
No SQL Databases: Cassandra, MongoDB, HBase, Neo4J
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts.
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB.
Development Methodologies: Agile, waterfall.
Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs, ExtJS and JSON.
Development / Build Tools: Eclipse, Ant, Maven, IntelliJ, JUNIT and log4J.
Frameworks: Struts, spring and Hibernate.
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat.
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle.
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2.
Operating systems: UNIX, LINUX, Mac OS and Windows Variants.
ETL Tools: Talend, Informatica, Pentaho.
PROFESSIONAL EXPERIENCE:
Solutions Engineer
Confidential, Bellevue, WA
Responsibilities:
- Understanding the functionalities of the various components that are built in the monitoring framework.
- Understanding various tools available in enterprise data warehouse which were exclusively used and built by Confidential .
- Helped partner services in tools deprecation analysis.
- Worked on knowledge perspectives on several tools based on Hadoop.
- Worked in documenting what the tool is how to run the tool and concluding whether the tool can be deprecated or not.
- Worked on the EDW-JSON SerDe (Serializer and Deserializer) and written integration tests.
- Worked on the production issue where Storage Api was going down at peak times.
- Worked on Jmeter tool by creating a thread group which simulates as many users as the Prod server is experiencing and Load Testing the Storage Api and Database.
- Created an instance in AWS Elastic Bean Stalk(EBS) which is currently been used in Lab for investigating the performance of the application.
- Created an RDS instance in AWS for Database and changed the instance class types for leveraging and scaling up the application.
- Worked on different data sources like Driver Manager, Hikari Datasource, Apache Tomcat Datasource on the Storage Api side.
- Documented all the results that I observed when running max number of threads on the Api and their Error % which helped in deciding the data source and size of RDS instance class type.
- Worked on SPROCS on the database side which does all the necessary validations and did some necessary recommendations in increasing the performance and keeping the Api and Db loosely coupled as per the business requirement.
- I have concluded the size of RDS instance class type to bump up to m4.2xlarge which helps storage Db to with stand for connections more than the available in connection pool.
- I have created the alarms for all the metrics which are been sent to AWS CloudWatch.
- I have created an AWS CloudFormation template for creating the alarms in AWS CloudWatch.
- I have worked with AWS CLI(Command Line Interface) in deploying the CloudFormation template and also in deploying the projects war file in AWS Elastic Bean Stalk(EBS).
- I worked on deprecation analysis on Legacy tools like DbSync and Sqoop Hive merge.
- I worked on the production issue occurred in Data Squeeze project on small file problems in Hadoop where SEQ files were unable to compact using CombineFileInputSplit.
- I have created a separate mapper class called SeqCompactionMapper which reads the input file format using CombineFileInputFormat and compacts the SEQ files.
- I have written the integration tests for SEQ file format which gives 100% test coverage and helped in making the project open source.
- I extensively worked on Compaction Utility tool which also helped in eradicating the small file problem in Hadoop.
- I worked on tool named MSSQL Loader which is a wrapper around SQL server JDBC accepts queries from the users and executes and displays the result in the shell.
- I worked on the Confidential internal tool called Backup Loader which downloads the file from NetApp servers and loads those files into the tables in HDFS.
- I worked on the tool named Mongo Loader
- Worked on extending the Oozie by creating the custom action nodes like SqlAction which executes the SQL commands against the supported databases.
Environment: Hadoop, Hdfs, Hive, Oozie, Map Reduce, DbSync, Hive Merge, Compaction Utility, MSSQL Loader, Mongo Loader, Backup Loader, AWS EBS, AWS RDS, AWS CloudWatch, AWS CloudFormation, MSSQL Server, SPROCS, Jmeter
Sr. Hadoop/Spark Developer
Confidential, Atlanta, GA
Responsibilities:
- Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, Oozie, Sqoop, Flume, Spark, Impala with Cloudera distribution
- Responsible for developing data pipeline using Flume,Sqoopto extract the data from weblogs and store in HDFS.
- Implemented Spark Scripts using Scala, Spark SQL to accesshivetables into spark for faster processing of data.
- Active member for developing POC on streaming data using Apache Kafka and Spark Streaming.
- Developed programs inSparkbased on the application for faster data processing than standard MapReduce programs.
- Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Developed and Configured Kafka brokers to pipeline server logs data into Spark streaming.
- Involved in performance tuning of Spark jobs using Cache and using complete advantage of cluster environment.
- Configured, deployed and maintained multi-node Dev and Tested Kafka Clusters.
- Configured spark streaming data to receive real time data from Kafka and store it in HDFS.
- Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
- Extracted data from various data source includingOLEDB, Green plum, Excel, Flat files and XML.
- Break down many-to-many relationship into one-to-many and many-to-one relationship to satisfy SSIS.
- Configured, supported and maintained all network, firewall, storage, load balancers, operating systems, and software inAWSEC2.
- Build and Configure users and privileges in back end ofEMR.
- The whole process includes complex data extracting, cleansing, filtering, mapping, validating, transforming, and loading into various dimensions and fact tables precisely.
- Create a star schema data warehouse to store all KPIs, offices, reporting periods, thresholds with valid date and other information.
- Worked on Cassandra in creating Cassandra tables to load large sets of semi structured data coming from various sources.
- Developed in scheduling Oozie workflow engine to run multiple Hive and Pig jobs using Python.
- Implement ETL standards utilizing proven data processing patterns with open source standard tools like Talend and Pentaho for more efficient processing.
- Developed Map Reduce ETL in Java/Pig and data validation using HIVE.
- Involved in runningHadoopstreaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
- Implemented fine tuning mechanisms like indexing, partitioning, bucketing to tune the Teradata/hive database which helped business users to fetch reports more efficiently.
- Worked withTeradata SQL troubleshooting and fine tuning of scripts given by Analysts and developers.
- Worked with data science team to build statistical model with Spark MLLIBand PySpark.
- Performed data analysis with Cassandra using Hive External tables.
- Exported the analyzed data to Cassandra using Sqoop and to generate reports for the BI team.
- Written complex Hive queries involving external dynamic partitioned on date Hive Tables which stores rolling window time-period user viewing history.
- Worked on different file formats (ORCFILE, TEXTFILE) and different Compression Codecs (GZIP, SNAPPY, LZO).
- Used Spark API overHadoopYARN as execution engine for data analytics using Hive.
- ImplementedYARNCapacity Scheduler on various environments and tuned configurations according to the application wise job loads.
- Good experience in writing Map Reduce programs in Java on MRv2 / YARN environment.
- Developed workflow inOozieto automate the tasks of loading data into HDFS and pre-processing with Pig and Hive.
Environment: Hadoop, HDFS, Hive, Flume, Map Reduce, AWS EC2, ERM, Sqoop, Kafka, Spark, Spark MLLIB, PySpark, Yarn, Pig, Cassandra, Oozie, shell Scripting, Scala, Maven, Java, JUnit, agile methodologies, MySQL.
Hadoop Developer
Confidential, Connecticut
Responsibilities:
- Converting the existing relational database model toHadoopecosystem.
- Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
- Developed Schedulers that communicated with the Cloud based services (AWS) to retrieve the data.
- Managed and reviewed Hadoop and HBase log files.
- Designed and implementedHIVE queries and functions for evaluation, filtering, loading and storing of data.
- Creating Hive tables and working on them using HiveQL.
- Developed data pipeline using Kafka and Spark to store data into HDFS.
- Big data processing usingScalacode, AWS, and Redshift.
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project inScala.
- Involved in performing the Linear Regression usingScalaAPI and Spark.
- Continuous monitoring and managing theHadoop cluster through Cloudera Manager.
- Involved in review of functional and non-functional requirements.
- Implemented Frameworks using Java and Python to automate the ingestion flow.
- Responsible to manage data coming from different sources.
- Loaded the CDRs from relational DB using Sqoop and other sources toHadoop cluster by using Flume.
- Experience in processing large volume of data and skills in parallel execution of process using Talend functionality.
- Involved in loading data from UNIX file system and FTP to HDFS.
- Used the external tables in Impala for data analysis.
- DevelopedHivequeries to analyze the output data.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- UsedZookeeperto provide coordination services to the cluster.
- Experience in importing and exporting the data using Sqoop and Flume from HDFS to Relational Database System and vice-versa.
- CreatedHivequeries that helped market analysts spot emerging trends by comparing fresh data with reference tables and historical metrics.
- Designed and implemented Spark jobs to support distributed data processing.
- Experience in optimizing Map Reduce Programs using combiners, partitioners and custom counters for delivering the best results.
- Wrote the Shell scripts to monitor the health check ofHadoop daemon services and respond accordingly to any warning or failure conditions.
- UsedSolrto enabling indexing for enabling searching on Non-primary key columns from Cassandra key spaces.
- Involved inHadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Followed agile methodology for the entire project.
- Installed and configured Apache Hadoop, Hive and Pig environment.
Environment: Hadoop, HDFS, Pig, Hive, Scala, Flume, Sqoop, Impala, AWS, Redshift, Oozie, Python, Shell Scripting, SQL Talend, Spark, HBase, Linux- Ubuntu, Kafka.
Hadoop Developer
Confidential, VT
Responsibilities:
- Worked on importing data from various sources and performed transformations using Map Reduce, hive to load data into HDFS
- Responsible for building scalable distributed data solutions using Hadoop. Written various Hive and Pig scripts
- Created HBase tables to store variable data formats coming from different portfolios Performed real time analytics on HBase using Java API and Rest API.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Moved all log/text files generated by various products into HDFS location.
- Experienced in managing and reviewing theHadooplog files.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Analysed HBase data in Hive by creating external partitioned and bucketed tables.
- Written Map Reduce code that will take input as log files and parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- In data exploration stage usedHiveand impala to get some insights about the customer data.
- Experienced with join different data sets using Pig join operations to perform queries using pig scripts.
- Installed and configuredHiveand also writtenHiveUDFs.
- Used Oozie andZookeeperfor workflow scheduling and monitoring.
- Actively involved in loading data from UNIX file system to HDFS.
- Loaded cache data into HBase usingSqoop.
- Worked withCassandraQuery Language (CQL) to execute queries on the data persisting in theCassandracluster.
- DevelopedHiveQLqueries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
- Converted sequential file data formats intoAvroformat with snappy conversion.
- Development Review (code review) to ensure that the code functionality is as per business requirements and the standards are followed.
- Developed Shell scripts to automate routine DBA tasks.
Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, HBase, Java, Maven, Avro, Cassandra Cloudera, Eclipse and Shell Scripting.
Java Developer
Confidential
Responsibilities:
- Understanding requirement and the technical aspects and architecture of the existing system.
- Help Design application development using Spring MVC framework, front-end interactive page design using HTML, JSP, JSTL, CSS, Javascript, JQuery and AJAX.
- Utilized various JavaScript andJQuerylibraries, AJAX for form validation and other interactive features.
- Involved in writing SQL queries for fetching data from Oracle database.
- Developed multi-tiered web - application using J2EE standards.
- Designed and developed Web Services to store and retrieve user profile information from database.
- Used Apache Axis to develop web services and SOAP protocol for web services communication.
- Used Spring DAO concept in order to interact with Database using JDBC template and Hibernate template.
- Well Experienced in deploying and configuring applications onto application servers like Web logic, WebSphere and Apache Tomcat.
- Followed AGILE Methodology and SCRUM to deliver the product with cross-functional skills.
- Used JUnit to test persistence and service tiers. Involved in unit test case preparation.
- Hands on experience in software configuration / change control process and tools like Subversion (SVN), Git CVS and Clear Case.
- Worked closely with team members on and offshore in development when having dependencies.
- Involved in sprint planning, code review, and daily standup meetings to discuss the progress of the application.
Environment: HTML, Ajax, Servlets, JSP, SQL, JavaScript, CSS, XML, SOAP, Tomcat Server, Hibernate, JDBC, Agile, Git, SVN.
Java Developer
Confidential
Responsibilities:
- Prepare Functional Requirement Specification and done coding, bug fixing and support.
- Involved in various phases of Software Development Life Cycle (SDLC) as requirement gathering, data modeling, analysis, architecture design & development for the project.
- Designed the front-end applications, user interactive (UI) web pages using web technologies like HTML,XHTML, and CSS.
- Implemented GUI pages by using JSP, JSTL, HTML, XHTML, CSS, JavaScript, AJAX
- Involved in creation of a queue manager in WebSphere MQ along with the necessary WebSphere MQ objects required for use with WebSphere Data Interchange.
- Developed SOAP based Web Services for Integrating with the Enterprise Information System Tier.
- Use ANT scripts to automate application build and deployment processes.
- Involved in design, development and Modification ofPL/SQLstored procedures, functions, packages and triggers to implement business rules into the application.
- UsedStrutsMVC architecture and SOA to structure the project module logic.
- DevelopedETLprocesses to load data from Flat files, SQL Server and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Have good InformaticaETLdevelopment experience in an offshore and onsite model and involved inETLCode reviews and testingETLprocesses.
- Developed mappings in Informatica to load the data including facts and dimensions from various sources into the Data Warehouse, using different transformations like Source Qualifier, JAVA, Expression, Lookup, Aggregate, Update Strategy and Joiner.
- Scheduling the sessions to extract, transform and load data in to warehouse database on Business requirements.
- Struts MVC framework for developing J2EE based web application.
- Extensively used Java multi-threading to implement batch Jobs with JDK 1.5 features.
- Designed an entire messaging interface and Message Topics using WebLogic JMS.
- Implemented the online application using Core Java, JDBC, JSP, Servlets, Spring, Hibernate, Web Services, SOAP, and WSDL.
- Migrated datasource passwords to encrypted passwords using Vault tool in all theJBossapplication servers.
- Used Spring Framework for Dependency injection and integrated with the Hibernate framework.
- Developed Session Beans which encapsulates the workflow logic.
- UsedJMS(JavaMessaging Service) for asynchronous communication between different modules.
- Developed web components using JSP, Servlets and JDBC.
Environment: Java, J2EE, Servlets, HTML, XHTML, CSS, JavaScript, Struts 1.1, Spring, JSP, JMS, JBoss 4.0, SQL Server 2000, Ant, CVS, PL/SQL, Hibernate, Eclipse, Linux
