Big Data/spark Developer Resume
MinneapoliS
SUMMARY
- Around 6 years of professional IT experience in Analysis, Design, Coding, development, Implementation in Java, Analytics and Big Data Technologies and working with Apache Hadoop, Spark to efficiently solve Big Data processing requirement.
- Over 4+ Years of development experience in Big Data Hadoop Ecosystem components and associated tools with data ingestion, importing, exporting, storage, querying, pre - processing, and analyzing of big data.
- Sound knowledge of Kafka, Cloudera HDFS and Map reduce concepts.
- Experience in data analysis using HIVE, IMPALA, PYSPARK and Good knowledge in Azure technologies- Azure Data Lake, Azure Data Factory.
- Experience in developing Spark streaming application with Kafka.
- Experience in writing Sqoop jobs for importing and exporting the data from and to RDBMS.
- In depth understanding of Apache spark job execution Components like DAG, job Scheduler, Task scheduler, Stages, and task.
- Experience in performance tuning of Spark applications using various resource allocation techniques and transformations reducing the Shuffles and increasing the Data Locality configurations.
- Firm grip on data mode ling, data marts, Data mining, database performance tuning & NoSQL (Cassandra, MongoDB) systems.
- Experience in using SDLC methodologies like Waterfall, Agile Scrum, and TDD for design and development.
- Extensively followed Agile software development process &Test-Driven development approach.
- Good Working Expertise on handling Terabytes of structured and unstructured data on huge Cluster environment.
- Data Ingress and Egress using Sqoop and Azure Data Factory from HDFS to Relational Database Systems and vice-versa.
- Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.
- Experience with distributed systems, large-scale non-relational data stores and multi-terabyte data warehouses.
- Experienced in working with using Spark SQL and Scala queries on different formats like Text file, Avro, Parquet files.
- Hands on experience in AVRO and Parquet file format, Dynamic Partitions, Bucketing for best Practice and Performance improvement.
- Worked on Data serialization formats for converting complex objects into sequence bits by using Avro, Parquet, JSON formats.
- Experience in database design using Stored Procedure, Functions, Triggers and strong experience in writing complex queries for DB2, SQL Server, Oracle.
- Applied knowledge of Git for source code version control and incorporated with Jenkins for CI/CD pipeline.
- Worked quality tracking, and user management with build tool Maven using CI/CD.
- Experienced in Microsoft Business Intelligence tools, developing SSIS (Integration Service), SSAS (Analysis Service) and SSRS (Reporting Service), building Key Performance Indicators and OLAP cubes.
- Proficiency in writing complex SQL queries, stored procedures, database design, creation and management of schemas, functions, DDL, Cursors and Triggers
- Hands on working with the reporting tool Tableau, creating dashboards attractive dashboards and worksheets.
- Experience in JAVA, WEB SERVICES, SOAP, HTML and XML related technologies.
- Experience in setting up Hadoop clusters on cloud platforms like AWS, experience with access management and identity in AWS.
- Experience in working with small and large groups and successful in meeting new technical challenges and finding solutions to meet the needs of the customer.
- Experience working both independently and collaboratively to solve problems and deliver high-quality results in a fast-paced, unstructured environment.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, Impala, Zookeeper, Spark, and Kafka, Oozie, NiFi
No SQL Database: HBase, MongoDB, Cassandra
Monitoring and Reporting: Tableau, Custom Shell Scripts, Power BI
Programming and Scripting: Java, C, C++, JavaScript, Shell Scripting, Python, Scala, Pig Latin, HiveQL, R Programming
Databases: Oracle, MY SQL, MS SQL server, DB2, Teradata
Analytics Tools: Tableau, Microsoft SSIS, Azure, SSAS and SSRS
Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript
Operating Systems: Linux, HP- Unix, Windows XP/Vista8/7/10 Windows, Mac OS
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
Designing Tools: UML, Visio
PROFESSIONAL EXPERIENCE
Confidential, Minneapolis
Big Data/Spark Developer
Responsibilities:
- Worked on analyzing Cloudera Hadoop cluster and different big data analytic tools including HIVE, HBase NoSQL database and Sqoop.
- Developed spark code using Scala and Spark-SQL for faster processing of data.
- Used the Memory computing capabilities of spark using Scala and performed advanced procedures like text analytics and processing.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Developed Map Reduce programs running on Yarn using Java to perform various ETL, cleaning and scrubbing tasks.
- Involved in loading data from LINUX and UNIX file system to HDFS.
- Experienced in importing and exporting data into HDFS using Sqoop and Flume.
- Involved in loading data from UNIX file system to HDFS and processed datasets like Text, Parquet, Avro, Fixed Width, Zip, JSON and XML.
- Worked on Extracting, Parsing, Cleaning, and ingesting the incoming web feed data and server logs into Azure Data Lake Store by handling structured and unstructured data.
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Worked on setting up Apache NiFi and performing POC with NiFi in orchestrating a data pipeline.
- Developed Merge jobs in Python to extract and load data from SQLServer to HDFS.
- Worked on Scala programming in developing spark streaming jobs for building stream data platform integrating with Kafka.
- Used data Meer for integration with Hadoop and other sources such as RDBMS (Oracle), SAS, Teradata, Flat files.
- Worked on AWS EC2, S3 & EMR for bursting requirements.
- Installed and configured Nagios check postgresql.pl plugins for Monitoring PostgreSQL instances effectively.
- Used AWS services like EC2 and S3 for small data sets and Migrated an existing on-premises application to AWS.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
- Manage the Data warehouse and provide database and data mining technical support.
- Managing and support of the Cloudera/Hadoop platform including capacity management, performance monitoring, user administration, troubleshooting and resolution of technical issues.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Built end to end CI/CD Pipelines in Jenkins to retrieve code, compile applications, perform tests and push build artifacts to Nexus Artifactory.
- Actively involved in Requirements Gathering, Analysis, Development, Unit Testing, and Integration Testing.
- Involved in planning process of iterations under the Agile Scrum methodology.
Environment: HDFS, Apache Hadoop, MapReduce, Hive, Kafka, Cloudera, HBase, AWS, Teradata, Sqoop, Spark 2.1.0, RDBMS/DB, MySQL, PostgreSQL CSV, Apache Nifi, AVRO data files, Azure Data lake.
Confidential, New York
Hadoop/Spark Developer
Responsibilities:
- Developed framework to encrypt sensitive data (SSN, Account number ...etc.) in all kinds of datasets and moved datasets one S3 bucket to another.
- Involved in Design and Development of technical specifications.
- Developed multiple Spark jobs in Scala for data cleaning and preprocessing.
- Analyzed large data sets by running Hive queries and Pig scripts and developed simple/complex MapReduce jobs using Hive and Pig.
- Programmed in Hive, Spark SQL, Java, and Python to streamline the incoming data and build the data pipelines to get the useful insights, and orchestrated pipelines using Azure Data Factory.
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Worked on processing streaming data from Kafka topics using Scala and ingest the data into Cassandra.
- Extensively worked on Hadoop development and implementation, Experience with Cloudera distribution . Built a Ingestion Framework that would ingest the files from SFTP to HDFS using Apache NIFI and ingest Financial data into HDFS
- Used Pig and Azure as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Imported/Exported data using Sqoop to load data from Teradata to HDFS/Hive on regular basis.
- Responsible for all backup, recovery, and upgrading of all the PostgreSQL databases
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
- Developed Python scripts to monitor health of MongoDB databases and perform ad-hoc backups using Mongo dump and Mongo restore.
- Involved in Designing and Developing Enhancements of CSG using AWS APIS.
- Wrote the PIG UDF in for converting Date format and time stamp formats from the unstructured files to required date formats and processed the same.
- Experience in using DB visualizer, Zookeeper and cloudera Manager.
- Designed and Implement test environment on AWS and experienced in writing and deploying AWS Lambda Functions.
- Created buckets for each Hive table based on clustering by client Id for better performance (optimization) while updating the tables.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Experience in using Sqoop to migrate data to and from HDFS and Oracle and deployed Hive and HBase integration to perform OLAP operations on HBase data.
- Expert in importing and exporting data into HDFS using Sqoop and Flume.
- Worked with cloud infrastructure such as Amazon Web Services (AWS) EC2 and S3.
- Written shell scripts to pull the data from Tumbleweed server to cornerstone staging area.
- Closely worked with Hadoop security team and infrastructure team to implement security.
- Implemented authentication and authorization service using Ke rberos authentication Protocol.
Environment: Cloudera Hadoop, MapReduce, Hive, pig, AWS, Nifi, spring batch, Azure, Scala, Teradata, Sqoop, Bash Scripting, Spark RDD, Kafka, Spark Sql, PostgreSQL.
Big Data Developer
Confidential
Responsibilities:
- Established a secure data platform on -Prim Cloudera infrastructure Document and build up ETL logic and data flows to facilitate the easy usage of data assets.
- Updated details in Rally for all assigned user stories and reported the status in daily stand-up meetings and involved in sprint planning, Grooming, show and retrospective calls.
- Analyze the reported issues during UAT and coordinate with the business users for the resolutions.
- Implemented multiple Map Reduce Jobs in java for data cleaning and pre-processing.
- Created the external tables in Hive to store information from different sources for testing needs.
- Experience with troubleshooting, fine tuning Spark and python -based applications for scalability and performance.
- Configured the Sqoop jobs to load the data from Netezza database and other sources into Hive Tables.
- Extensively Used Sqoop to import/export data from Netezza and Hive tables, incremental imports and created Sqoop jobs for last saved value.
- Collected logs data from webservers and integrated into HDFS using Flume.
- Created Hive tables to store the processed results in a tabular format.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Good experience in importing other enterprise data from different data sources into HDFS.
- Configured Oozie workflow engine to run multiple Map Reduce, HiveQL and Pig jobs.
- Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
- Involved in loading data from one Environment to another environment in Hadoop Platform.
- Developed multiple Map Reduce jobs in java for data cleaning and data mining.
- Worked with Infrastructure Engineers and System Administrators as suitable in designing the big-data infrastructure.
- Prepare UNIX shell script to test the application.
- Design and develop Oozie workflows for timely data loading into Hadoop ecosystems from other data sources.
Environment: Hive, HDFS, Cloudera MapReduce, AWS, Flume, Pig, Spark Core, Spark -SQL, Oozie, Oracle, Yarn, Netezza, GitHub, Junit, Linux, HBase, Azure, Sqoop, HDFS, Java, Scala, Maven and Splunk, Eclipse.
Confidential
Java Developer/Software Trainee
Responsibilities:
- Provided technical guidance to business analysts, gather the requirements and convert them into technical specifications/artifacts.
- Development of code using JSP, HTML5, CSS3, JavaScript and JQuery library.
- Involved in all phases of the Software development life cycle (SDLC) using Agile Methodology.
- Involved in the development of the User Interfaces using HTML, JSP, JS, Dojo Tool Kit, CSS
- Identifies, recommends and pursues technology/practices to relevant to the solution of highly complex projects.
- Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application using Agile Methodology (Scrum).
- Played a key role in developing the business layer and data management components of this web-based system over J2EE architecture .
- Designed and developed Customer registration and login screens using HTML, Servlets and JavaScript.
- Configured spring to manage Actions as beans and set their dependencies in a context file and integrated middle tier with Hibernate.
- Involved in Unit Testing of various modules by generating the Test Cases.
- Involved in Bug fixing of various modules that were raised by the testing teams in the application during the Integration testing phase.
- Used Java Messaging Services (JMS) for reliable and asynchronous exchange of important information such as payment status report.
- Coordinate with all the teams for functional requirements and ensure compliance to all architecture standards.
- Used JIRA for bug tracking, other defect/bug tracking tools with team to improve communications and reduce defects in product.
- Involved in AGILE Methodology process which includes bi-weekly sprint and daily scrum to discuss the design and work progress.
- Responsible for building, deploying and version controlling the code by using GITHUB
Environment: JEE, Hibernate, Web services, MVC, Messaging systems, HTML 5, CSS 3, JavaScript, jQuery, REST Web services, XML, Eclipse, MY SQL, Mongo DB,JIRA,UNIX,JIRA